The Problem
Two May 2026 r/posts (one MCP cutting Claude Code subscription token cost 40%, another routing bulk to Qwen3 35B for ~20× savings) make the case. Real gains exist but are workload-specific.
The Scavio Solution
Three MCPs, each with a clear job: Semble (in-repo code lookup, returns ranges not full files) + Scavio (out-of-repo grounding, 5-8 narrow web tools collapsed to one) + optional local-LLM-routing MCP for summarize/classify steps. Measure before/after for two weeks; do not assume gains.
Before
Default Claude Code on a 100K-LOC repo fans out grep+read across 8-15 files per query, ~30-50K input tokens per find-and-edit. Plus 5-8 narrow web tools each adding ~150 tokens of description per message.
After
Per-message input tokens drop ~4-8K from tool consolidation; per-query input on grep+read drops ~80-98% via Semble; bulk summarize/classify routed to local-LLM-MCP at ~$0.10/M. Per-week cost on heavy users drops 30-50%.
Who It Is For
Heavy Claude Code users, agencies billing per-message agent time, startups paying $200+/mo per developer in tokens.
Key Benefits
- Three MCPs, three roles, no overlap
- Measure don't assume
- Tool consolidation wins on every workload
- Repo-size-dependent gains from Semble
- Local-LLM-routing optional and workload-specific
Python Example
# Setup is CLI:
# claude mcp add semble <semble-url>
# claude mcp add scavio https://mcp.scavio.dev/mcp --header 'x-api-key: $SCAVIO_API_KEY'
# claude mcp add local-llm <local-mcp-url> # optionalJavaScript Example
// CLI config; no JS application code.Platforms Used
Web search with knowledge graph, PAA, and AI overviews