Web Scraping with Firecrawl
JavaScript-rendered pages to markdown, from Claude Code or the command line
Claude Code’s built-in WebFetch tool fails on JavaScript-heavy pages - it gets the raw HTML before any content renders. Firecrawl handles this by rendering the page first, then extracting clean markdown.
I use it for scraping flight search results, documentation sites, and anything behind client-side rendering.
Setup
1. Get an API key
Sign up at firecrawl.dev. Free tier gives you 500 credits - enough for casual use.
2. Add the MCP server to Claude Code
claude mcp add firecrawl -- npx -y firecrawl-mcp
Then set the API key. Add to your shell config (~/.bashrc or ~/.zshrc):
export FIRECRAWL_API_KEY='fc-your-key-here'
Or add it directly to ~/.claude.json under the MCP server config:
"firecrawl": {
"type": "stdio",
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "fc-your-key-here"
}
}
3. Use it
In Claude Code:
“Firecrawl this page and summarise it: https://example.com/some-js-heavy-page"
Or more naturally:
“MCP into firecrawl and scrape [URL]”
Claude will use the Firecrawl MCP server to fetch the rendered page, then work with the content.
Standalone scripts
If you want to save web pages to your vault from the command line (outside Claude Code), these scripts call the Firecrawl API directly.
Single page scrape
Saves a webpage as markdown with frontmatter (source URL, capture date, tags) to your vault.
#!/bin/bash
# firecrawl-scrape.sh - Save a single webpage to vault as markdown
# Usage: firecrawl-scrape.sh "https://example.com" [output-path]
#
# If output-path not specified, saves to 02 Inbox/Web Clippings/
# Output filename auto-generated from page title and date
set -euo pipefail
VAULT_ROOT="${VAULT_PATH:-$HOME/Files}"
DEFAULT_OUTPUT_DIR="02 Inbox/Web Clippings"
URL="${1:-}"
OUTPUT_DIR="${2:-$DEFAULT_OUTPUT_DIR}"
if [[ -z "$URL" ]]; then
echo "Usage: firecrawl-scrape.sh <url> [output-directory]"
echo "Example: firecrawl-scrape.sh 'https://example.com/article' '05 Resources/Research'"
exit 1
fi
if [[ -z "${FIRECRAWL_API_KEY:-}" ]]; then
echo "Error: FIRECRAWL_API_KEY environment variable not set"
echo "Get your API key at: https://www.firecrawl.dev/"
echo "Then: export FIRECRAWL_API_KEY='fc-your-key-here'"
exit 1
fi
# Ensure output directory exists (relative to vault)
FULL_OUTPUT_DIR="$VAULT_ROOT/$OUTPUT_DIR"
mkdir -p "$FULL_OUTPUT_DIR"
echo "Scraping: $URL"
# Call Firecrawl API
RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v1/scrape" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-d "{
\"url\": \"$URL\",
\"formats\": [\"markdown\"],
\"onlyMainContent\": true
}")
# Check for errors
if echo "$RESPONSE" | jq -e '.error' > /dev/null 2>&1; then
ERROR=$(echo "$RESPONSE" | jq -r '.error')
echo "Error from Firecrawl: $ERROR"
exit 1
fi
# Extract content
MARKDOWN=$(echo "$RESPONSE" | jq -r '.data.markdown // empty')
TITLE=$(echo "$RESPONSE" | jq -r '.data.metadata.title // empty')
SOURCE_URL=$(echo "$RESPONSE" | jq -r '.data.metadata.sourceURL // empty')
if [[ -z "$MARKDOWN" ]]; then
echo "Error: No content returned"
echo "Response: $RESPONSE"
exit 1
fi
# Generate filename from title or URL
DATE=$(date +%Y-%m-%d)
if [[ -n "$TITLE" ]]; then
SAFE_TITLE=$(echo "$TITLE" | sed 's/[<>:"/\\|?*]//g' | sed 's/ */ /g' | head -c 100)
FILENAME="${DATE} - ${SAFE_TITLE}.md"
else
DOMAIN=$(echo "$URL" | sed -E 's|https?://([^/]+).*|\1|')
FILENAME="${DATE} - ${DOMAIN}.md"
fi
OUTPUT_FILE="$FULL_OUTPUT_DIR/$FILENAME"
# Write file with frontmatter
cat > "$OUTPUT_FILE" << EOF
---
source: $SOURCE_URL
captured: $DATE
tags:
- web-clipping
---
# $TITLE
$MARKDOWN
EOF
echo "Saved to: $OUTPUT_DIR/$FILENAME"
Batch scrape
Same as above but processes a list of URLs with rate limiting.
#!/bin/bash
# firecrawl-batch.sh - Save multiple webpages to vault as markdown
# Usage: firecrawl-batch.sh urls.txt [output-path]
# firecrawl-batch.sh url1 url2 url3 ... [--output path]
set -euo pipefail
VAULT_ROOT="${VAULT_PATH:-$HOME/Files}"
DEFAULT_OUTPUT_DIR="02 Inbox/Web Clippings"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
show_usage() {
echo "Usage:"
echo " firecrawl-batch.sh urls.txt [output-directory]"
echo " firecrawl-batch.sh url1 url2 url3 --output [directory]"
}
if [[ $# -lt 1 ]]; then
show_usage
exit 1
fi
if [[ -z "${FIRECRAWL_API_KEY:-}" ]]; then
echo "Error: FIRECRAWL_API_KEY environment variable not set"
exit 1
fi
# Parse arguments
URLS=()
OUTPUT_DIR="$DEFAULT_OUTPUT_DIR"
while [[ $# -gt 0 ]]; do
case "$1" in
--output|-o)
OUTPUT_DIR="$2"
shift 2
;;
*)
if [[ -f "$1" ]]; then
while IFS= read -r line || [[ -n "$line" ]]; do
[[ -z "$line" || "$line" =~ ^# ]] && continue
URLS+=("$line")
done < "$1"
else
URLS+=("$1")
fi
shift
;;
esac
done
if [[ ${#URLS[@]} -eq 0 ]]; then
echo "Error: No URLs provided"
exit 1
fi
FULL_OUTPUT_DIR="$VAULT_ROOT/$OUTPUT_DIR"
mkdir -p "$FULL_OUTPUT_DIR"
echo "Processing ${#URLS[@]} URL(s) โ $OUTPUT_DIR"
echo "---"
SUCCESS=0
FAILED=0
for URL in "${URLS[@]}"; do
echo ""
echo "Scraping: $URL"
RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v1/scrape" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-d "{
\"url\": \"$URL\",
\"formats\": [\"markdown\"],
\"onlyMainContent\": true
}")
if echo "$RESPONSE" | jq -e '.error' > /dev/null 2>&1; then
ERROR=$(echo "$RESPONSE" | jq -r '.error')
echo " Error: $ERROR"
((FAILED++))
continue
fi
MARKDOWN=$(echo "$RESPONSE" | jq -r '.data.markdown // empty')
TITLE=$(echo "$RESPONSE" | jq -r '.data.metadata.title // empty')
SOURCE_URL=$(echo "$RESPONSE" | jq -r '.data.metadata.sourceURL // empty')
if [[ -z "$MARKDOWN" ]]; then
echo " Error: No content returned"
((FAILED++))
continue
fi
DATE=$(date +%Y-%m-%d)
if [[ -n "$TITLE" ]]; then
SAFE_TITLE=$(echo "$TITLE" | sed 's/[<>:"/\\|?*]//g' | sed 's/ */ /g' | head -c 100)
FILENAME="${DATE} - ${SAFE_TITLE}.md"
else
DOMAIN=$(echo "$URL" | sed -E 's|https?://([^/]+).*|\1|')
FILENAME="${DATE} - ${DOMAIN}.md"
fi
OUTPUT_FILE="$FULL_OUTPUT_DIR/$FILENAME"
cat > "$OUTPUT_FILE" << EOF
---
source: $SOURCE_URL
captured: $DATE
tags:
- web-clipping
---
# $TITLE
$MARKDOWN
EOF
echo " Saved: $FILENAME"
((SUCCESS++))
# Rate limiting
sleep 1
done
echo ""
echo "---"
echo "Complete: $SUCCESS succeeded, $FAILED failed"
Installation
# Copy scripts to your PATH
cp firecrawl-scrape.sh firecrawl-batch.sh ~/.local/bin/
chmod +x ~/.local/bin/firecrawl-scrape.sh ~/.local/bin/firecrawl-batch.sh
# Set API key in shell config
echo 'export FIRECRAWL_API_KEY="fc-your-key-here"' >> ~/.bashrc
Usage:
# Single page
firecrawl-scrape.sh "https://docs.example.com/api-reference"
# Batch from file
firecrawl-batch.sh urls.txt --output "05 Resources/Research"
# Batch from arguments
firecrawl-batch.sh "https://a.com" "https://b.com" --output "03 Projects/MyProject"
When to use Firecrawl vs WebFetch
| WebFetch | Firecrawl | |
|---|---|---|
| JavaScript rendering | No | Yes |
| Cost | Free (built-in) | 500 free credits, then paid |
| Speed | Fast | Slower (renders page) |
| Setup | None | API key + MCP server |
Use WebFetch for static pages and documentation. Use Firecrawl when WebFetch returns empty or broken content - that usually means the page needs JavaScript to render.
Gotcha: Pi-hole blocking
If you run Pi-hole for DNS, it may block Firecrawl’s signup page or API endpoints. Temporarily disable blocking if you get connection errors during signup:
# Disable Pi-hole for 5 minutes
pihole disable 300