Two May 2026 r/posts (Claude Code 40% cut via consolidation; bulk routing 20× via Qwen3 35B on Nosana) make the case. This walks the pragmatic recipe.
Prerequisites
- Claude Code Plus or higher
- Semble installed for in-repo lookup
- Scavio API key
- Two-week measurement window
Walkthrough
Step 1: Baseline: 2-week measurement before any change
Capture per-message input/output tokens before touching anything.
// Use Anthropic console or self-rolled wrapper.Step 2: Install Semble
Returns matching ranges, not full files.
// Per Semble repo README:
// claude mcp add semble <semble-url>Step 3: Install Scavio MCP
Replaces 5-8 narrow web tools with one.
claude mcp add scavio https://mcp.scavio.dev/mcp --header 'x-api-key: $SCAVIO_API_KEY'Step 4: Drop unused narrow web/scrape skills
Tool consolidation = per-message description token cut.
// claude mcp list → identify duplicates and never-invoked → claude mcp remove <name>Step 5: Update CLAUDE.md / system prompt
Routing rule.
// CLAUDE.md: For in-repo code lookup use semble. For out-of-repo use scavio.search. Don't grep+read.Step 6: Re-measure 2-week post
Per-message input + output tokens.
// Compare before/after. Heavy users on >100K LOC repos typically see 30-50% per-week cost cut.Step 7: Optional: local-LLM-routing MCP for bulk steps
Workload-specific.
// claude mcp add local-llm <mcp-url>Python Example
# Heavy user with $300/mo in tokens cutting 40% saves ~$120/mo.JavaScript Example
// Same shape; the work is config + system prompt + measurement.Expected Output
Per-week Claude Code token cost drops 30-50% on heavy users, measured before/after.