mcpagentsperformance

MCP Proxy Cuts Context Bloat 99%

Single MCP daemon, 50K-token schema loads compressed to under 500. Pattern from r/opencodeCLI cuts process count and RAM by ~95%.

April 28, 2026

5 min read

An r/opencodeCLI thread documented a problem most multi-agent users eventually hit: 35 npm processes, ~4 GB of RAM, and ~50,000 tokens of MCP schemas loaded into context before the user typed anything. The fix was a single MCP gateway daemon, and it cut schema overhead 99%.

How the bloat happens

Every AI assistant that uses MCP starts its own copy of every MCP server it needs. Run pi, VS Code, and opencode side by side and each spawns 12+ npm exec processes for Playwright, Neo4j, shadcn, searxng, sequential-thinking, next-devtools, Tavily, context7, and the rest. Three identical fleets, none talking to each other.

The token side of the problem

Each MCP server exposes 8 to 15 tools. Each tool has a verbose JSON schema describing inputs and outputs. A typical search-shaped MCP burns 5,000 to 10,000 tokens just to advertise itself. Stack 12 servers and 50,000 tokens vanish before the model has reasoned about anything.

The two-part fix

Part one is an MCP gateway: a single daemon process that proxies all the upstream servers. Each agent connects to the gateway over HTTP instead of spawning its own processes.

Part two is consolidating search-shaped MCP servers. If your stack has Tavily MCP, Brave MCP, Reddit MCP, and a YouTube scraper MCP, replacing them with one Scavio MCP cuts schema cost by ~75% on that single surface.

The gateway config

JSON

{
  "upstreams": {
    "scavio": {
      "url": "https://mcp.scavio.dev/mcp",
      "headers": { "x-api-key": "$SCAVIO_API_KEY" }
    },
    "playwright": { "command": "npx", "args": ["@playwright/mcp"] },
    "shadcn": { "command": "npx", "args": ["@shadcn/mcp"] }
  }
}

Each agent now points at the daemon

JSON

{
  "mcpServers": {
    "gateway": { "url": "http://localhost:8765/mcp" }
  }
}

The numbers from the original thread

The OP reported a 99.3% reduction in schema-load tokens, going from ~50,000 down to under 500. Process count dropped from 35+ to 1. RAM dropped from 4 GB to ~200 MB. Most of those wins came from the daemon architecture, not the consolidation. But the consolidation is free if you're carrying multiple search MCPs.

Why this is the single biggest agent-cost lever

On a 30-turn agent session, a 50K-token schema load is loaded once but counted against every cached prefix on every turn. At Claude Sonnet 4.6 input pricing, that's $0.50 of pure overhead before the user gets value. After the gateway, it's under $0.05. Most teams don't see this because they don't look at the prompt-cache layer; the cost just shows up as a higher bill.

What you keep, what you drop

Keep all the upstream servers. Drop the per-agent process explosion. Drop the duplicate search-shaped MCPs. The agent code stays identical except for one config swap.

One small caveat

The gateway adds a single point of failure. If it crashes, all agents lose tools. In production, run it under systemd or pm2 with auto-restart. For most solo developers running this locally, that complexity is a non-issue.