mcptokensoptimization

MCP Server Sprawl: Token Cost Real Numbers

8 MCP servers inject 100+ tool descriptions consuming 10-15K tokens per request. Consolidation saves $50-200/mo in token costs.

May 15, 2026

8 min

Connecting 8 MCP servers to a single agent session injects 100+ tool descriptions into the system prompt, consuming 10-15K tokens before the agent does anything. But the tool descriptions are not the real cost problem. The real cost is agent loops: each tool call triggers a new LLM inference with the full context, and agents with too many tools make more unnecessary tool calls as they explore available options.

The token math

Each MCP tool description includes a name, description, and parameter schema. A typical tool description consumes 100-200 tokens. Here is what common MCP server configurations cost in system prompt tokens:

Python

# Token cost of MCP tool descriptions in system prompt
mcp_servers = {
    "filesystem": 8,     # read, write, list, mkdir, etc.
    "github": 15,        # issues, PRs, repos, branches, etc.
    "slack": 12,         # send, read, channels, users, etc.
    "postgres": 6,       # query, schema, tables, etc.
    "web-search": 3,     # search, fetch, extract
    "memory": 4,         # store, retrieve, list, delete
    "calendar": 8,       # events, create, update, delete, etc.
    "email": 10,         # send, read, search, labels, etc.
}

tokens_per_tool = 150  # average including schema
total_tools = sum(mcp_servers.values())
total_tokens = total_tools * tokens_per_tool

print(f"Total tools: {total_tools}")
print(f"System prompt tokens from tools: {total_tokens:,}")

# Cost per agent turn (tool descriptions are repeated every inference)
llm_rates = {
    "Claude Sonnet 4": 3.0,
    "Claude Opus 4": 15.0,
    "GPT-4o": 2.5,
}

for model, rate in llm_rates.items():
    cost_per_turn = (total_tokens / 1_000_000) * rate
    print(f"\n{model}:")
    print(f"  Tool description cost per turn: ${cost_per_turn:.4f}")
    print(f"  Over 20 turns: ${cost_per_turn * 20:.3f}")
    print(f"  Over 1000 sessions/day: ${cost_per_turn * 20 * 1000:.2f}/day")

The hidden cost: unnecessary tool calls

The bigger problem is behavioral. When an agent has 66 tools available, it often:

Calls tools to "explore" before answering, even when the answer is in context
Chains multiple tool calls when one would suffice
Picks the wrong tool and retries with a different one
Makes speculative tool calls "just in case"

Each unnecessary tool call costs one full LLM inference (system prompt + conversation history + tool result). A single wasted tool call with 10K context costs $0.03-0.15 depending on the model.

Real session cost breakdown

Python

# Realistic agent session cost with MCP sprawl
system_prompt_tokens = 10000    # Tool descriptions
conversation_tokens = 3000      # User messages + previous turns
tool_result_tokens = 500        # Average per tool call result

# Agent makes 8 tool calls to answer a question (too many)
tool_calls = 8
necessary_calls = 3  # Only 3 were actually needed

# Total tokens processed across all turns
total_input_tokens = 0
for i in range(tool_calls):
    turn_tokens = system_prompt_tokens + conversation_tokens + (tool_result_tokens * i)
    total_input_tokens += turn_tokens

rate = 3.0  # Claude Sonnet 4 per 1M input tokens
total_cost = (total_input_tokens / 1_000_000) * rate
necessary_cost = sum(
    system_prompt_tokens + conversation_tokens + (tool_result_tokens * i)
    for i in range(necessary_calls)
) / 1_000_000 * rate

print(f"Total input tokens (8 calls): {total_input_tokens:,}")
print(f"Actual cost: ${total_cost:.4f}")
print(f"Necessary cost (3 calls): ${necessary_cost:.4f}")
print(f"Wasted: ${total_cost - necessary_cost:.4f} ({(1 - necessary_cost/total_cost)*100:.0f}%)")

Solution: project-scoped MCP configuration

Instead of connecting every MCP server to every session, scope the tools to the task:

Code review task: filesystem + github only (23 tools instead of 66)
Research task: web-search + memory only (7 tools instead of 66)
Communication task: slack + email + calendar (30 tools instead of 66)

Reducing from 66 to 20 tools saves 6,900 tokens per turn in system prompt costs and reduces unnecessary tool calls by eliminating distractions.

Implementation: tool scoping by intent

Python

# MCP tool scoping configuration
TASK_SCOPES = {
    "research": ["web-search", "memory"],
    "coding": ["filesystem", "github"],
    "communication": ["slack", "email", "calendar"],
    "data": ["postgres", "filesystem"],
    "full": None,  # All tools, use sparingly
}

def get_tools_for_task(task_type):
    """Return only the MCP servers needed for this task type."""
    scope = TASK_SCOPES.get(task_type, TASK_SCOPES["research"])
    if scope is None:
        return ALL_MCP_SERVERS  # Full access when explicitly needed
    return {k: v for k, v in ALL_MCP_SERVERS.items() if k in scope}

# Token savings
def estimate_savings(task_type):
    all_tools = sum(mcp_servers.values())
    scoped_servers = TASK_SCOPES.get(task_type, [])
    if scoped_servers is None:
        return 0
    scoped_tools = sum(mcp_servers.get(s, 0) for s in scoped_servers)
    saved_tokens = (all_tools - scoped_tools) * 150
    return saved_tokens

for task, scope in TASK_SCOPES.items():
    if scope is not None:
        saved = estimate_savings(task)
        print(f"{task}: saves {saved:,} tokens/turn")

When MCP sprawl is worth it

Full tool access makes sense for:

General-purpose assistant mode where you cannot predict the task
Complex workflows that genuinely span multiple domains
Interactive sessions where the user directs tool usage

It does not make sense for production agents with defined tasks, automated pipelines, or any scenario where the task type is known in advance. Scope your tools to the task and save both tokens and money.

Consolidation alternative

Instead of 8 specialized MCP servers, consider fewer servers with broader capability. A single search MCP server like mcp.scavio.dev/mcp covers web search across multiple platforms (Google, Bing, TikTok, YouTube, Reddit) with one tool definition instead of five separate search tools. Fewer tools means less system prompt overhead and fewer decision points for the agent.

MCP Server Sprawl: Token Cost Real Numbers

The token math

The hidden cost: unnecessary tool calls

Real session cost breakdown

Solution: project-scoped MCP configuration

Implementation: tool scoping by intent

When MCP sprawl is worth it

Consolidation alternative

Continue reading

Connect Scavio to Any AI Assistant with MCP

Build a Cross-Platform Product Research Agent with LangGraph