Summarise Large Files with Gemini 2.5 Flash

Paste this URL into Claude Code and tell it to set this up for you.

The Problem

Claude Code has limits on file size. Try to read a 500KB log file or a large codebase dump and you’ll hit either:

The Read tool’s internal limit (~256KB)
Context window consumption that crowds out actual work

Gemini 2.5 Flash has a 1M token context window (~750k words). For summarisation tasks - “what’s in this file?” rather than “edit line 47” - that’s a useful escape hatch.

The Solution

A bash script that pipes file content to Gemini’s API and returns a summary.

The Script

Save as ~/.local/bin/summarise-file:

#!/usr/bin/env bash
#
# summarise-file - Summarise large files using Gemini 2.5 Flash
#
# Usage:
#   summarise-file <file>           # Summarise a file
#   summarise-file -                # Read from stdin
#   cat file.txt | summarise-file   # Pipe content
#
# Requires:
#   GEMINI_API_KEY environment variable

set -euo pipefail

MODEL="gemini-2.5-flash"
API_URL="https://generativelanguage.googleapis.com/v1beta/models/${MODEL}:generateContent"

# Check for API key
if [[ -z "${GEMINI_API_KEY:-}" ]]; then
    echo "Error: GEMINI_API_KEY not set" >&2
    echo "Get one at https://aistudio.google.com/apikey" >&2
    exit 1
fi

# Get content from file or stdin
if [[ $# -eq 0 ]] || [[ "$1" == "-" ]]; then
    CONTENT=$(cat)
    FILENAME="stdin"
elif [[ -f "$1" ]]; then
    CONTENT=$(cat "$1")
    FILENAME=$(basename "$1")
else
    echo "Error: File not found: $1" >&2
    exit 1
fi

# Check content isn't empty
if [[ -z "$CONTENT" ]]; then
    echo "Error: No content to summarise" >&2
    exit 1
fi

# Get file size for context
CHAR_COUNT=${#CONTENT}
if [[ $CHAR_COUNT -gt 1000000 ]]; then
    SIZE_NOTE="(~$(( CHAR_COUNT / 1000 ))k chars)"
else
    SIZE_NOTE="(${CHAR_COUNT} chars)"
fi

echo "Summarising ${FILENAME} ${SIZE_NOTE}..." >&2

# Build JSON payload - pipe content to jq to avoid argument limits
SYSTEM_PROMPT="Summarise the following document concisely. Focus on key points, main arguments, and actionable information. If it's code, describe what it does. If it's structured data, describe the schema and notable entries.

---

"

PAYLOAD=$(printf '%s%s' "$SYSTEM_PROMPT" "$CONTENT" | jq -Rs '{
    contents: [{parts: [{text: .}]}],
    generationConfig: {temperature: 0.3, maxOutputTokens: 2048}
}')

# Make API call (pipe payload to avoid argument length limits)
RESPONSE=$(echo "$PAYLOAD" | curl -s -X POST "${API_URL}?key=${GEMINI_API_KEY}" \
    -H "Content-Type: application/json" \
    --max-time 120 \
    --data-binary @-)

# Check for errors
if echo "$RESPONSE" | jq -e '.error' >/dev/null 2>&1; then
    ERROR_MSG=$(echo "$RESPONSE" | jq -r '.error.message')
    echo "API Error: ${ERROR_MSG}" >&2
    exit 1
fi

# Extract and print the summary
SUMMARY=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].text // "No response generated"')

echo ""
echo "$SUMMARY"

Make it executable:

chmod +x ~/.local/bin/summarise-file

Get an API Key

Go to https://aistudio.google.com/apikey
Create a new API key
Important: Set up Cloud Billing to get the paid tier (see data privacy section below). You get $300 free credit for 90 days - more than enough for personal use. No charges unless you explicitly upgrade after that.
Add to your shell config:

# Add to BOTH if you use zsh but want Claude Code to use it
# (Claude Code runs in bash, not zsh)
echo 'export GEMINI_API_KEY="your-key-here"' >> ~/.zshrc
echo 'export GEMINI_API_KEY="your-key-here"' >> ~/.bashrc
source ~/.zshrc

Usage

Direct file summarisation

summarise-file /path/to/large-file.txt

PDFs

PDFs are binary - extract text first with pdftotext (from poppler-utils):

pdftotext document.pdf - | summarise-file

Piped content

cat /var/log/syslog | summarise-file

Auto-Detection with Claude Code Hooks

You can set up a PreToolUse hook that blocks Claude’s Read tool on large files and suggests summarise-file instead.

Save as ~/.claude/hooks/check-file-size.sh:

#!/bin/bash
# PreToolUse hook for Read: block large files and suggest summarise-file
# Exit codes: 0=allow, 1=deny (ask user), 2=block with message

INPUT=$(cat)
FILE=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')

# Skip if no file path or file doesn't exist
[ -z "$FILE" ] && exit 0
[ ! -f "$FILE" ] && exit 0

# Get file size in bytes
SIZE=$(stat -c%s "$FILE" 2>/dev/null || echo 0)

# Threshold: 200KB (below Claude Code's 256KB internal limit)
THRESHOLD=204800

if [ "$SIZE" -gt "$THRESHOLD" ]; then
  SIZE_HUMAN=$(numfmt --to=iec "$SIZE")
  echo "File is ${SIZE_HUMAN}. Use instead: summarise-file \"$FILE\"" >&2
  exit 2  # BLOCK
fi

exit 0  # ALLOW

Make it executable:

chmod +x ~/.claude/hooks/check-file-size.sh

Add to ~/.claude/settings.json:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Read",
        "hooks": [
          {
            "type": "command",
            "command": "/home/YOUR_USERNAME/.claude/hooks/check-file-size.sh"
          }
        ]
      }
    ]
  }
}

Now when Claude tries to read a file over 200KB, the hook blocks it with a message like:

File is 241M. Use instead: summarise-file "/path/to/large-file.pdf"

Claude sees this and (should) use the suggested command instead.

Data Privacy: Free vs Paid Tier

This matters. From Google’s Gemini API terms:

Free tier:

“Google uses the content you submit to the Services and any generated responses to provide, improve, and develop Google products and services and machine learning technologies”

Paid tier:

“Google doesn’t use your prompts or responses to improve our products”

The paid tier still logs prompts briefly for abuse detection, but critically: not used for training or product improvement.

How to Get Paid Tier

The API key alone isn’t enough - you need Cloud Billing enabled:

Go to https://aistudio.google.com/apikey
Look for “Upgrade to Paid” or billing settings
Add a payment method (you get $300 free credit for 90 days)
Activate the full account - this confirms you’re on paid tier, not just trial. Doesn’t charge you anything until credits run out.
Set a budget alert - in Cloud Console billing, create a budget (e.g., $10/month) to get notified if usage spikes unexpectedly

Gemini 2.5 Flash costs:

$0.10 per 1M input tokens
$0.40 per 1M output tokens

Summarising a 100k token document costs about $0.01-0.05. Negligible.

Limitations

Summarisation only - You get a summary, not the actual content. Can’t use this if you need to edit specific lines or find exact strings.
Binary files - The script reads raw bytes. For PDFs, use pdftotext first. For images, this won’t help.
Very large files - Even Gemini’s 1M token limit has bounds. A 1500-page textbook might still be too large in one pass.

Claude Code + Obsidian series - The broader workflow this fits into
Google’s Gemini API pricing - Current rates
Gemini API terms - Data usage policies