MCP Server Sprawl: Token Cost Real Numbers
8 MCP servers inject 100+ tool descriptions consuming 10-15K tokens per request. Consolidation saves $50-200/mo in token costs.
Connecting 8 MCP servers to a single agent session injects 100+ tool descriptions into the system prompt, consuming 10-15K tokens before the agent does anything. But the tool descriptions are not the real cost problem. The real cost is agent loops: each tool call triggers a new LLM inference with the full context, and agents with too many tools make more unnecessary tool calls as they explore available options.
The token math
Each MCP tool description includes a name, description, and parameter schema. A typical tool description consumes 100-200 tokens. Here is what common MCP server configurations cost in system prompt tokens:
# Token cost of MCP tool descriptions in system prompt
mcp_servers = {
"filesystem": 8, # read, write, list, mkdir, etc.
"github": 15, # issues, PRs, repos, branches, etc.
"slack": 12, # send, read, channels, users, etc.
"postgres": 6, # query, schema, tables, etc.
"web-search": 3, # search, fetch, extract
"memory": 4, # store, retrieve, list, delete
"calendar": 8, # events, create, update, delete, etc.
"email": 10, # send, read, search, labels, etc.
}
tokens_per_tool = 150 # average including schema
total_tools = sum(mcp_servers.values())
total_tokens = total_tools * tokens_per_tool
print(f"Total tools: {total_tools}")
print(f"System prompt tokens from tools: {total_tokens:,}")
# Cost per agent turn (tool descriptions are repeated every inference)
llm_rates = {
"Claude Sonnet 4": 3.0,
"Claude Opus 4": 15.0,
"GPT-4o": 2.5,
}
for model, rate in llm_rates.items():
cost_per_turn = (total_tokens / 1_000_000) * rate
print(f"\n{model}:")
print(f" Tool description cost per turn: ${cost_per_turn:.4f}")
print(f" Over 20 turns: ${cost_per_turn * 20:.3f}")
print(f" Over 1000 sessions/day: ${cost_per_turn * 20 * 1000:.2f}/day")The hidden cost: unnecessary tool calls
The bigger problem is behavioral. When an agent has 66 tools available, it often:
- Calls tools to "explore" before answering, even when the answer is in context
- Chains multiple tool calls when one would suffice
- Picks the wrong tool and retries with a different one
- Makes speculative tool calls "just in case"
Each unnecessary tool call costs one full LLM inference (system prompt + conversation history + tool result). A single wasted tool call with 10K context costs $0.03-0.15 depending on the model.
Real session cost breakdown
# Realistic agent session cost with MCP sprawl
system_prompt_tokens = 10000 # Tool descriptions
conversation_tokens = 3000 # User messages + previous turns
tool_result_tokens = 500 # Average per tool call result
# Agent makes 8 tool calls to answer a question (too many)
tool_calls = 8
necessary_calls = 3 # Only 3 were actually needed
# Total tokens processed across all turns
total_input_tokens = 0
for i in range(tool_calls):
turn_tokens = system_prompt_tokens + conversation_tokens + (tool_result_tokens * i)
total_input_tokens += turn_tokens
rate = 3.0 # Claude Sonnet 4 per 1M input tokens
total_cost = (total_input_tokens / 1_000_000) * rate
necessary_cost = sum(
system_prompt_tokens + conversation_tokens + (tool_result_tokens * i)
for i in range(necessary_calls)
) / 1_000_000 * rate
print(f"Total input tokens (8 calls): {total_input_tokens:,}")
print(f"Actual cost: ${total_cost:.4f}")
print(f"Necessary cost (3 calls): ${necessary_cost:.4f}")
print(f"Wasted: ${total_cost - necessary_cost:.4f} ({(1 - necessary_cost/total_cost)*100:.0f}%)")Solution: project-scoped MCP configuration
Instead of connecting every MCP server to every session, scope the tools to the task:
- Code review task: filesystem + github only (23 tools instead of 66)
- Research task: web-search + memory only (7 tools instead of 66)
- Communication task: slack + email + calendar (30 tools instead of 66)
Reducing from 66 to 20 tools saves 6,900 tokens per turn in system prompt costs and reduces unnecessary tool calls by eliminating distractions.
Implementation: tool scoping by intent
# MCP tool scoping configuration
TASK_SCOPES = {
"research": ["web-search", "memory"],
"coding": ["filesystem", "github"],
"communication": ["slack", "email", "calendar"],
"data": ["postgres", "filesystem"],
"full": None, # All tools, use sparingly
}
def get_tools_for_task(task_type):
"""Return only the MCP servers needed for this task type."""
scope = TASK_SCOPES.get(task_type, TASK_SCOPES["research"])
if scope is None:
return ALL_MCP_SERVERS # Full access when explicitly needed
return {k: v for k, v in ALL_MCP_SERVERS.items() if k in scope}
# Token savings
def estimate_savings(task_type):
all_tools = sum(mcp_servers.values())
scoped_servers = TASK_SCOPES.get(task_type, [])
if scoped_servers is None:
return 0
scoped_tools = sum(mcp_servers.get(s, 0) for s in scoped_servers)
saved_tokens = (all_tools - scoped_tools) * 150
return saved_tokens
for task, scope in TASK_SCOPES.items():
if scope is not None:
saved = estimate_savings(task)
print(f"{task}: saves {saved:,} tokens/turn")When MCP sprawl is worth it
Full tool access makes sense for:
- General-purpose assistant mode where you cannot predict the task
- Complex workflows that genuinely span multiple domains
- Interactive sessions where the user directs tool usage
It does not make sense for production agents with defined tasks, automated pipelines, or any scenario where the task type is known in advance. Scope your tools to the task and save both tokens and money.
Consolidation alternative
Instead of 8 specialized MCP servers, consider fewer servers with broader capability. A single search MCP server like mcp.scavio.dev/mcp covers web search across multiple platforms (Google, Bing, TikTok, YouTube, Reddit) with one tool definition instead of five separate search tools. Fewer tools means less system prompt overhead and fewer decision points for the agent.