Web Scraping with Firecrawl

Claude Code’s built-in WebFetch tool fails on JavaScript-heavy pages - it gets the raw HTML before any content renders. Firecrawl handles this by rendering the page first, then extracting clean markdown.

I use it for scraping flight search results, documentation sites, and anything behind client-side rendering.

Setup

1. Get an API key

2. Add the MCP server to Claude Code

claude mcp add firecrawl -- npx -y firecrawl-mcp

Then set the API key. Add to your shell config (~/.bashrc or ~/.zshrc):

export FIRECRAWL_API_KEY='fc-your-key-here'

Or add it directly to ~/.claude.json under the MCP server config:

"firecrawl": {
  "type": "stdio",
  "command": "npx",
  "args": ["-y", "firecrawl-mcp"],
  "env": {
    "FIRECRAWL_API_KEY": "fc-your-key-here"
  }
}

3. Use it

In Claude Code:

“Firecrawl this page and summarise it: https://example.com/some-js-heavy-page"

Or more naturally:

“MCP into firecrawl and scrape [URL]”

Claude will use the Firecrawl MCP server to fetch the rendered page, then work with the content.

Standalone scripts

If you want to save web pages to your vault from the command line (outside Claude Code), these scripts call the Firecrawl API directly.

Single page scrape

Saves a webpage as markdown with frontmatter (source URL, capture date, tags) to your vault.

#!/bin/bash
# firecrawl-scrape.sh - Save a single webpage to vault as markdown
# Usage: firecrawl-scrape.sh "https://example.com" [output-path]
#
# If output-path not specified, saves to 02 Inbox/Web Clippings/
# Output filename auto-generated from page title and date

set -euo pipefail

VAULT_ROOT="${VAULT_PATH:-$HOME/Files}"
DEFAULT_OUTPUT_DIR="02 Inbox/Web Clippings"

URL="${1:-}"
OUTPUT_DIR="${2:-$DEFAULT_OUTPUT_DIR}"

if [[ -z "$URL" ]]; then
    echo "Usage: firecrawl-scrape.sh <url> [output-directory]"
    echo "Example: firecrawl-scrape.sh 'https://example.com/article' '05 Resources/Research'"
    exit 1
fi

if [[ -z "${FIRECRAWL_API_KEY:-}" ]]; then
    echo "Error: FIRECRAWL_API_KEY environment variable not set"
    echo "Get your API key at: https://www.firecrawl.dev/"
    echo "Then: export FIRECRAWL_API_KEY='fc-your-key-here'"
    exit 1
fi

# Ensure output directory exists (relative to vault)
FULL_OUTPUT_DIR="$VAULT_ROOT/$OUTPUT_DIR"
mkdir -p "$FULL_OUTPUT_DIR"

echo "Scraping: $URL"

# Call Firecrawl API
RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v1/scrape" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
    -d "{
        \"url\": \"$URL\",
        \"formats\": [\"markdown\"],
        \"onlyMainContent\": true
    }")

# Check for errors
if echo "$RESPONSE" | jq -e '.error' > /dev/null 2>&1; then
    ERROR=$(echo "$RESPONSE" | jq -r '.error')
    echo "Error from Firecrawl: $ERROR"
    exit 1
fi

# Extract content
MARKDOWN=$(echo "$RESPONSE" | jq -r '.data.markdown // empty')
TITLE=$(echo "$RESPONSE" | jq -r '.data.metadata.title // empty')
SOURCE_URL=$(echo "$RESPONSE" | jq -r '.data.metadata.sourceURL // empty')

if [[ -z "$MARKDOWN" ]]; then
    echo "Error: No content returned"
    echo "Response: $RESPONSE"
    exit 1
fi

# Generate filename from title or URL
DATE=$(date +%Y-%m-%d)
if [[ -n "$TITLE" ]]; then
    SAFE_TITLE=$(echo "$TITLE" | sed 's/[<>:"/\\|?*]//g' | sed 's/  */ /g' | head -c 100)
    FILENAME="${DATE} - ${SAFE_TITLE}.md"
else
    DOMAIN=$(echo "$URL" | sed -E 's|https?://([^/]+).*|\1|')
    FILENAME="${DATE} - ${DOMAIN}.md"
fi

OUTPUT_FILE="$FULL_OUTPUT_DIR/$FILENAME"

# Write file with frontmatter
cat > "$OUTPUT_FILE" << EOF
---
source: $SOURCE_URL
captured: $DATE
tags:
  - web-clipping
---

# $TITLE

$MARKDOWN
EOF

echo "Saved to: $OUTPUT_DIR/$FILENAME"

Batch scrape

Same as above but processes a list of URLs with rate limiting.

#!/bin/bash
# firecrawl-batch.sh - Save multiple webpages to vault as markdown
# Usage: firecrawl-batch.sh urls.txt [output-path]
#        firecrawl-batch.sh url1 url2 url3 ... [--output path]

set -euo pipefail

VAULT_ROOT="${VAULT_PATH:-$HOME/Files}"
DEFAULT_OUTPUT_DIR="02 Inbox/Web Clippings"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

show_usage() {
    echo "Usage:"
    echo "  firecrawl-batch.sh urls.txt [output-directory]"
    echo "  firecrawl-batch.sh url1 url2 url3 --output [directory]"
}

if [[ $# -lt 1 ]]; then
    show_usage
    exit 1
fi

if [[ -z "${FIRECRAWL_API_KEY:-}" ]]; then
    echo "Error: FIRECRAWL_API_KEY environment variable not set"
    exit 1
fi

# Parse arguments
URLS=()
OUTPUT_DIR="$DEFAULT_OUTPUT_DIR"

while [[ $# -gt 0 ]]; do
    case "$1" in
        --output|-o)
            OUTPUT_DIR="$2"
            shift 2
            ;;
        *)
            if [[ -f "$1" ]]; then
                while IFS= read -r line || [[ -n "$line" ]]; do
                    [[ -z "$line" || "$line" =~ ^# ]] && continue
                    URLS+=("$line")
                done < "$1"
            else
                URLS+=("$1")
            fi
            shift
            ;;
    esac
done

if [[ ${#URLS[@]} -eq 0 ]]; then
    echo "Error: No URLs provided"
    exit 1
fi

FULL_OUTPUT_DIR="$VAULT_ROOT/$OUTPUT_DIR"
mkdir -p "$FULL_OUTPUT_DIR"

echo "Processing ${#URLS[@]} URL(s) → $OUTPUT_DIR"
echo "---"

SUCCESS=0
FAILED=0

for URL in "${URLS[@]}"; do
    echo ""
    echo "Scraping: $URL"

    RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v1/scrape" \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
        -d "{
            \"url\": \"$URL\",
            \"formats\": [\"markdown\"],
            \"onlyMainContent\": true
        }")

    if echo "$RESPONSE" | jq -e '.error' > /dev/null 2>&1; then
        ERROR=$(echo "$RESPONSE" | jq -r '.error')
        echo "  Error: $ERROR"
        ((FAILED++))
        continue
    fi

    MARKDOWN=$(echo "$RESPONSE" | jq -r '.data.markdown // empty')
    TITLE=$(echo "$RESPONSE" | jq -r '.data.metadata.title // empty')
    SOURCE_URL=$(echo "$RESPONSE" | jq -r '.data.metadata.sourceURL // empty')

    if [[ -z "$MARKDOWN" ]]; then
        echo "  Error: No content returned"
        ((FAILED++))
        continue
    fi

    DATE=$(date +%Y-%m-%d)
    if [[ -n "$TITLE" ]]; then
        SAFE_TITLE=$(echo "$TITLE" | sed 's/[<>:"/\\|?*]//g' | sed 's/  */ /g' | head -c 100)
        FILENAME="${DATE} - ${SAFE_TITLE}.md"
    else
        DOMAIN=$(echo "$URL" | sed -E 's|https?://([^/]+).*|\1|')
        FILENAME="${DATE} - ${DOMAIN}.md"
    fi

    OUTPUT_FILE="$FULL_OUTPUT_DIR/$FILENAME"

    cat > "$OUTPUT_FILE" << EOF
---
source: $SOURCE_URL
captured: $DATE
tags:
  - web-clipping
---

# $TITLE

$MARKDOWN
EOF

    echo "  Saved: $FILENAME"
    ((SUCCESS++))

    # Rate limiting
    sleep 1
done

echo ""
echo "---"
echo "Complete: $SUCCESS succeeded, $FAILED failed"

Installation

# Copy scripts to your PATH
cp firecrawl-scrape.sh firecrawl-batch.sh ~/.local/bin/
chmod +x ~/.local/bin/firecrawl-scrape.sh ~/.local/bin/firecrawl-batch.sh

# Set API key in shell config
echo 'export FIRECRAWL_API_KEY="fc-your-key-here"' >> ~/.bashrc

Usage:

# Single page
firecrawl-scrape.sh "https://docs.example.com/api-reference"

# Batch from file
firecrawl-batch.sh urls.txt --output "05 Resources/Research"

# Batch from arguments
firecrawl-batch.sh "https://a.com" "https://b.com" --output "03 Projects/MyProject"

When to use Firecrawl vs WebFetch

	WebFetch	Firecrawl
JavaScript rendering	No	Yes
Cost	Free (built-in)	500 free credits, then paid
Speed	Fast	Slower (renders page)
Setup	None	API key + MCP server

Use WebFetch for static pages and documentation. Use Firecrawl when WebFetch returns empty or broken content - that usually means the page needs JavaScript to render.

Gotcha: Pi-hole blocking

If you run Pi-hole for DNS, it may block Firecrawl’s signup page or API endpoints. Temporarily disable blocking if you get connection errors during signup:

# Disable Pi-hole for 5 minutes
pihole disable 300