mcptokensoptimization

MCP Tool Schema Bloat: The Hidden Token Cost

Every enabled MCP server adds tool schemas to your context. Some burn 3,000-5,000 tokens per server before you type anything. How to audit and fix.

May 7, 2026

5 min read

Every enabled MCP server contributes its tool definitions to the context at session start, and some servers register 15+ tools with verbose descriptions. This schema bloat is a hidden token cost that most Claude Code users never audit. The Gandalf pretooluse trick (pre-filtering which tools the model considers) cuts this waste, but the simpler fix is fewer, better-scoped MCP servers.

How MCP Schema Bloat Happens

When Claude Code starts a session, it loads tool schemas from every enabled MCP server. Each schema includes the tool name, description, parameter definitions, and type information. A single MCP server registering 15 tools with detailed descriptions can add 3,000-5,000 tokens to every session context before the user types anything. With 5-6 MCP servers enabled, you can burn 15,000-25,000 tokens on tool schemas alone.

This cost repeats on every context refresh. If your session compresses and reloads, you pay the schema tax again. Over a day of active coding, schema bloat can account for 10-20% of total token spend.

Audit Your Current MCP Token Usage

Check how many MCP servers you have enabled and estimate the schema cost. Each tool definition averages 200-400 tokens depending on description length and parameter count.

Bash

# Count MCP servers in your config
cat ~/.claude/mcp.json | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'{len(d.get("mcpServers", {}))} servers enabled')"

# Estimate schema tokens (rough: 300 tokens per tool)
# A server with 10 tools = ~3,000 tokens per session start

Three Strategies to Reduce Schema Bloat

First, disable MCP servers you do not use in every session. If you only use the GitHub MCP server once a week, disable it by default and enable it when needed. Second, prefer MCP servers with fewer, well-scoped tools. A server with 3 focused tools (search Google, search Reddit, extract URL) is more token-efficient than one with 20 granular tools. Third, use the Gandalf pretooluse pattern: a system prompt addition that instructs the model to mentally filter tools before calling them, reducing the effective consideration set.

Choosing Token-Efficient Search MCP Servers

Search MCP servers vary significantly in schema size. Some register separate tools for every search platform (google_search, reddit_search, youtube_search, amazon_search, walmart_search = 5 tool schemas). Others consolidate into one tool with a platform parameter (search = 1 schema). The consolidated approach saves 800-1,600 tokens per session.

Response format also matters. An MCP server that returns structured JSON (title, snippet, URL) adds fewer tokens to context than one returning full page Markdown. For a typical search returning 10 results, structured JSON uses ~300 tokens per result while Markdown can use 2,000+ tokens per result.

JSON

{
  "mcpServers": {
    "scavio": {
      "url": "https://mcp.scavio.dev/mcp",
      "headers": { "x-api-key": "YOUR_KEY" }
    }
  }
}

Scavio's hosted MCP server registers a compact tool set with structured JSON responses, keeping both schema cost and result cost low. The hosted nature also means no local server process competing for resources during coding sessions.

MCP Tool Schema Bloat: The Hidden Token Cost

How MCP Schema Bloat Happens

Audit Your Current MCP Token Usage

Three Strategies to Reduce Schema Bloat

Choosing Token-Efficient Search MCP Servers

Continue reading

Connect Scavio to Any AI Assistant with MCP

Build a Cross-Platform Product Research Agent with LangGraph