agentstokensoptimization

Agent Token Savings: The Gandalf Pretooluse Trick and Beyond

MCP schema bloat is a hidden token cost. Pre-filtering tools helps, but fewer MCP servers with compact schemas is the structural fix.

May 7, 2026

5 min read

The Gandalf pretooluse trick (pre-filtering which tools the model considers before making a call) is clever, but the bigger win is reducing the MCP tool schema bloat that forces the trick in the first place. Every enabled MCP server contributes tool definitions to context at session start. Some servers register 15+ tools with verbose descriptions. Auditing and reducing that schema payload saves tokens on every turn of every session.

Where the Tokens Actually Go

Most Claude Code users assume search result content is their biggest token cost. In many sessions, MCP tool schemas consume more tokens than search results do. A session with 6 MCP servers averaging 5 tools each, at 300 tokens per tool schema, burns 9,000 tokens on tool definitions before any work begins. This repeats every time the context compresses and reloads.

The Audit Process

Step one: count how many MCP servers are enabled and estimate the schema cost. Step two: for each server, check how often you actually use its tools in a typical session. Step three: disable servers you use less than once per day.

Bash

# Check your MCP server count and estimate token impact
cat ~/.claude/mcp.json 2>/dev/null | python3 -c "
import sys, json
try:
    d = json.load(sys.stdin)
    servers = d.get('mcpServers', {})
    print(f'{len(servers)} MCP servers enabled')
    print(f'Estimated schema cost: {len(servers) * 5 * 300:,} tokens/session')
    for name in servers:
        print(f'  - {name}')
except: print('No MCP config found')
"

Choosing Token-Efficient MCP Servers

Not all MCP servers are equal in schema efficiency. A search server that registers one tool with a platform parameter (search: platform + query) uses 300 tokens of schema. A search server that registers separate tools for each platform (google_search, reddit_search, youtube_search, amazon_search, walmart_search) uses 1,500 tokens. Same functionality, 5x the schema cost.

Response format matters too. Structured JSON responses (title + snippet + URL per result) average 300 tokens per search. Full-page Markdown responses average 2,000-5,000 tokens. Over a session with 10 search calls, that is 3,000 vs 30,000 tokens -- a 10x difference.

The Practical Minimum

For most coding sessions, you need: one search MCP server (for documentation and community solutions), and optionally one domain-specific MCP server (GitHub, database, etc.). Everything else can be disabled by default. A hosted MCP server like Scavio's (mcp.scavio.dev/mcp) has the added benefit of zero local process overhead and compact tool schemas, since search across all five platforms is one tool definition.

JSON

{
  "mcpServers": {
    "scavio": {
      "url": "https://mcp.scavio.dev/mcp",
      "headers": { "x-api-key": "YOUR_KEY" }
    }
  }
}

One config line, one tool schema, five search platforms. The token budget you save goes to actual code generation and reasoning instead of tool definition overhead.

Agent Token Savings: The Gandalf Pretooluse Trick and Beyond

Where the Tokens Actually Go

The Audit Process

Choosing Token-Efficient MCP Servers

The Practical Minimum

Continue reading

Connect Scavio to Any AI Assistant with MCP

Build a Cross-Platform Product Research Agent with LangGraph