Summarise Large Files with Gemini 2.5 Flash
Extend Claude Code's reach with a 1M token context window
When Claude Code can't read a file because it's too large, pipe it to Gemini 2.5 Flash instead. Simple bash script, PreToolUse hook for automatic detection, and notes on data privacy with Google's API tiers.
Paste this URL into Claude Code and tell it to set this up for you.
The Problem
Claude Code has limits on file size. Try to read a 500KB log file or a large codebase dump and you’ll hit either:
- The Read tool’s internal limit (~256KB)
- Context window consumption that crowds out actual work
Gemini 2.5 Flash has a 1M token context window (~750k words). For summarisation tasks - “what’s in this file?” rather than “edit line 47” - that’s a useful escape hatch.
The Solution
A bash script that pipes file content to Gemini’s API and returns a summary.
The Script
Save as ~/.local/bin/summarise-file:
#!/usr/bin/env bash
#
# summarise-file - Summarise large files using Gemini 2.5 Flash
#
# Usage:
# summarise-file <file> # Summarise a file
# summarise-file - # Read from stdin
# cat file.txt | summarise-file # Pipe content
#
# Requires:
# GEMINI_API_KEY environment variable
set -euo pipefail
MODEL="gemini-2.5-flash"
API_URL="https://generativelanguage.googleapis.com/v1beta/models/${MODEL}:generateContent"
# Check for API key
if [[ -z "${GEMINI_API_KEY:-}" ]]; then
echo "Error: GEMINI_API_KEY not set" >&2
echo "Get one at https://aistudio.google.com/apikey" >&2
exit 1
fi
# Get content from file or stdin
if [[ $# -eq 0 ]] || [[ "$1" == "-" ]]; then
CONTENT=$(cat)
FILENAME="stdin"
elif [[ -f "$1" ]]; then
CONTENT=$(cat "$1")
FILENAME=$(basename "$1")
else
echo "Error: File not found: $1" >&2
exit 1
fi
# Check content isn't empty
if [[ -z "$CONTENT" ]]; then
echo "Error: No content to summarise" >&2
exit 1
fi
# Get file size for context
CHAR_COUNT=${#CONTENT}
if [[ $CHAR_COUNT -gt 1000000 ]]; then
SIZE_NOTE="(~$(( CHAR_COUNT / 1000 ))k chars)"
else
SIZE_NOTE="(${CHAR_COUNT} chars)"
fi
echo "Summarising ${FILENAME} ${SIZE_NOTE}..." >&2
# Build JSON payload - pipe content to jq to avoid argument limits
SYSTEM_PROMPT="Summarise the following document concisely. Focus on key points, main arguments, and actionable information. If it's code, describe what it does. If it's structured data, describe the schema and notable entries.
---
"
PAYLOAD=$(printf '%s%s' "$SYSTEM_PROMPT" "$CONTENT" | jq -Rs '{
contents: [{parts: [{text: .}]}],
generationConfig: {temperature: 0.3, maxOutputTokens: 2048}
}')
# Make API call (pipe payload to avoid argument length limits)
RESPONSE=$(echo "$PAYLOAD" | curl -s -X POST "${API_URL}?key=${GEMINI_API_KEY}" \
-H "Content-Type: application/json" \
--max-time 120 \
--data-binary @-)
# Check for errors
if echo "$RESPONSE" | jq -e '.error' >/dev/null 2>&1; then
ERROR_MSG=$(echo "$RESPONSE" | jq -r '.error.message')
echo "API Error: ${ERROR_MSG}" >&2
exit 1
fi
# Extract and print the summary
SUMMARY=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].text // "No response generated"')
echo ""
echo "$SUMMARY"
Make it executable:
chmod +x ~/.local/bin/summarise-file
Get an API Key
- Go to https://aistudio.google.com/apikey
- Create a new API key
- Important: Set up Cloud Billing to get the paid tier (see data privacy section below). You get $300 free credit for 90 days - more than enough for personal use. No charges unless you explicitly upgrade after that.
- Add to your shell config:
# Add to BOTH if you use zsh but want Claude Code to use it
# (Claude Code runs in bash, not zsh)
echo 'export GEMINI_API_KEY="your-key-here"' >> ~/.zshrc
echo 'export GEMINI_API_KEY="your-key-here"' >> ~/.bashrc
source ~/.zshrc
Usage
Direct file summarisation
summarise-file /path/to/large-file.txt
PDFs
PDFs are binary - extract text first with pdftotext (from poppler-utils):
pdftotext document.pdf - | summarise-file
Piped content
cat /var/log/syslog | summarise-file
Auto-Detection with Claude Code Hooks
You can set up a PreToolUse hook that blocks Claude’s Read tool on large files and suggests summarise-file instead.
Save as ~/.claude/hooks/check-file-size.sh:
#!/bin/bash
# PreToolUse hook for Read: block large files and suggest summarise-file
# Exit codes: 0=allow, 1=deny (ask user), 2=block with message
INPUT=$(cat)
FILE=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')
# Skip if no file path or file doesn't exist
[ -z "$FILE" ] && exit 0
[ ! -f "$FILE" ] && exit 0
# Get file size in bytes
SIZE=$(stat -c%s "$FILE" 2>/dev/null || echo 0)
# Threshold: 200KB (below Claude Code's 256KB internal limit)
THRESHOLD=204800
if [ "$SIZE" -gt "$THRESHOLD" ]; then
SIZE_HUMAN=$(numfmt --to=iec "$SIZE")
echo "File is ${SIZE_HUMAN}. Use instead: summarise-file \"$FILE\"" >&2
exit 2 # BLOCK
fi
exit 0 # ALLOW
Make it executable:
chmod +x ~/.claude/hooks/check-file-size.sh
Add to ~/.claude/settings.json:
{
"hooks": {
"PreToolUse": [
{
"matcher": "Read",
"hooks": [
{
"type": "command",
"command": "/home/YOUR_USERNAME/.claude/hooks/check-file-size.sh"
}
]
}
]
}
}
Now when Claude tries to read a file over 200KB, the hook blocks it with a message like:
File is 241M. Use instead: summarise-file "/path/to/large-file.pdf"
Claude sees this and (should) use the suggested command instead.
Data Privacy: Free vs Paid Tier
This matters. From Google’s Gemini API terms:
Free tier:
“Google uses the content you submit to the Services and any generated responses to provide, improve, and develop Google products and services and machine learning technologies”
Paid tier:
“Google doesn’t use your prompts or responses to improve our products”
The paid tier still logs prompts briefly for abuse detection, but critically: not used for training or product improvement.
How to Get Paid Tier
The API key alone isn’t enough - you need Cloud Billing enabled:
- Go to https://aistudio.google.com/apikey
- Look for “Upgrade to Paid” or billing settings
- Add a payment method (you get $300 free credit for 90 days)
- Activate the full account - this confirms you’re on paid tier, not just trial. Doesn’t charge you anything until credits run out.
- Set a budget alert - in Cloud Console billing, create a budget (e.g., $10/month) to get notified if usage spikes unexpectedly
Gemini 2.5 Flash costs:
- $0.10 per 1M input tokens
- $0.40 per 1M output tokens
Summarising a 100k token document costs about $0.01-0.05. Negligible.
Limitations
- Summarisation only - You get a summary, not the actual content. Can’t use this if you need to edit specific lines or find exact strings.
- Binary files - The script reads raw bytes. For PDFs, use
pdftotextfirst. For images, this won’t help. - Very large files - Even Gemini’s 1M token limit has bounds. A 1500-page textbook might still be too large in one pass.
Related
- Claude Code + Obsidian series - The broader workflow this fits into
- Google’s Gemini API pricing - Current rates
- Gemini API terms - Data usage policies