Cut Claude Code Token Cost Without Downgrading the Model
An r/ClaudeCode post upgraded to Max. The cheaper fix: code search MCP + Scavio MCP + skill trim + Sonnet for routine ops. Plus user keeps Plus.
An r/ClaudeCode post captured the bind real Claude Code users feel: just upgraded to Max as Opus 4.7 burns through Plus. Interested in how others lower token usage. The reflex (upgrade to Max) is the most expensive option that solves the symptom. The cheaper fix attacks where the tokens actually go.
Where the tokens actually go
Audit a typical heavy-use day:
- grep+read fanout on large repos — 30-50K input tokens per "find feature X" query.
- Attached MCP descriptions — every MCP's tool list is in every message's input.
- Skill folder bloat — 70 default skills × 150 tokens = ~10K input tokens of pure description per message.
- Attached docs/files — large markdown files dragged in for context.
- Always-Opus-4.7 on routine tasks (rename, format, lint fixes) where Sonnet 4.6 is fine.
All five have cheaper fixes than upgrading the plan.
Fix 1: Local code search MCP for large repos
On any repo >100K LOC, Semble (or sourcegraph-cody-bridge) cuts grep+read fanout from tens of thousands of input tokens per query to a few hundred. The 98% number is real because the MCP returns matching ranges, not full files. This is the single biggest token cut available.
claude mcp add semble <semble-url>
claude mcp add scavio https://mcp.scavio.dev/mcp \
--header 'x-api-key: $SCAVIO_API_KEY'
# System prompt rule: 'For in-repo code, semble.search.
# For framework docs / Stack Overflow / GitHub issues,
# scavio.search. Do not use grep+read for in-repo.'Fix 2: Trim attached MCPs to 4-6 named ones
Each MCP's tool list is in every message's input. Audit: drop never-invoked MCPs. Replace 5-8 narrow web/scrape skills with one Scavio MCP (six tools, one description block). Per-message input savings: 4-8K tokens.
Fix 3: Trim skill folder
Same logic, different surface. The r/hermesagent post documented 73 → 26 skills. Per-message input savings: another ~3-5K tokens. Quarterly re-audit to prevent drift.
Fix 4: Sonnet 4.6 default, Opus 4.7 selectively
Routine ops on Sonnet 4.6: rename variables, format files, lint fixes, docstring writes, simple refactors. Opus 4.7 reserved for: architecture decisions, novel logic, hard debugging, complex multi-step reasoning. Switching by task type is the single biggest pure-cost fix because Opus 4.7 is several times more expensive per token than Sonnet 4.6.
Fix 5: Audit attached docs/files
Big markdown files dragged into every session compound. Be honest about which docs the agent actually needs vs which are habit. Reference URLs in prompts and let the agent fetch on demand instead of pre-loading.
The recipe in order
- Install code search MCP for large repos.
- Trim attached MCPs to 4-6 named.
- Replace narrow web skills with one Scavio MCP.
- Trim skill folder to active 20-30 skills.
- Switch model based on task type.
- Audit attached docs/files.
Cost expectations
A tuned Plus user on heavy daily use: ~$50/mo all-in (Plus $20 + Scavio $30 + tuning discipline). A blanket-Max user: $100-200/mo for similar output. For most users, the tuning beats the upgrade.
When Max actually earns its keep
Heavy contractors doing 6+ hours/day of Opus-grade work where the cognitive overhead of model-switching + tool tuning isn't worth it. If you bill enough to absorb the plan and the model-switching habit slows you down, Max is the right call. For most users, it isn't.
The honest answer to the OP
The fixes that don't require a plan upgrade compound. Code search MCP cuts grep+read fanout 80-98% on large repos. Skill/MCP consolidation saves 4-8K input tokens per message. Sonnet 4.6 for routine ops cuts Opus 4.7 spend by half or more. Apply all four; re-measure after one week. Most users will find Max wasn't the right answer once the cheaper fixes shipped.
Why Anthropic doesn't lead with this
Plan revenue beats efficiency advice. The fixes above are honest user-facing optimizations that Anthropic could publish but generally doesn't. The community fills the gap. The OP's post asking for it is the right instinct; the answer is worth more than the upgrade.