Definition
Context bloat is the accumulation of tokens in an LLM's context window before the user has asked anything — usually from MCP tool schemas, large system prompts, or unfiltered retrieval results — that crowds out room for actual reasoning.
In Depth
Most agent frameworks load every connected tool's full schema into context at session start. A fleet of 10 MCP servers with 8 tools each at 600 tokens per schema burns 48,000 tokens before any work happens. Context bloat compounds when retrieval steps return raw HTML or 50-result SERP pages instead of trimmed structured snippets. The standard 2026 fixes: MCP gateways that compress tool descriptions, search APIs that return typed JSON instead of raw HTML, and agent harnesses that lazy-load tool schemas only when the model attempts to call them.
Example Usage
After consolidating to an MCP gateway and switching from raw-HTML scraping to typed Scavio JSON, the agent's per-turn context bloat dropped from 50K tokens to under 8K, freeing room for genuine reasoning.
Platforms
Context Bloat is relevant across the following platforms, all accessible through Scavio's unified API:
Related Terms
MCP Gateway
An MCP gateway (or MCP proxy) is a single Model Context Protocol server that fronts multiple upstream MCP servers, expos...
Agent Architecture
Agent architecture is the set of design choices that turn an LLM prompt into a production system: routing and classifica...
Grounding LLM Workflows
Grounding LLM workflows is the pattern of injecting verified, fresh, structured context — from search APIs, internal doc...