Where Do Claude Code Tokens Actually Go?
Auditing token usage across thousands of Claude Code sessions -- where tokens go and how to optimize consumption.
Claude Code sessions can burn through tokens fast. A 30-minute coding session might consume hundreds of thousands of tokens, and the bill adds up. But where do those tokens actually go? Most developers assume it is their prompts, but the reality is more nuanced. Understanding token distribution helps you write more efficient prompts and structure your workflow to minimize waste.
The Token Budget Breakdown
A typical Claude Code session spends tokens across four categories:
- System prompt and context: The system prompt, CLAUDE.md files, MCP tool definitions, and other injected context. This is sent with every message and can be surprisingly large.
- Conversation history: Every previous message in the session -- your prompts and Claude's responses -- accumulates in the context window. Later messages in a long session carry the full weight of earlier ones.
- Tool calls and results: File reads, searches, grep results, and other tool outputs. A single file read of a large file can inject thousands of tokens.
- Model output: Claude's responses, including code it writes, explanations, and thinking. Output tokens are typically more expensive than input tokens.
Where the Waste Hides
The biggest source of token waste is not your prompts -- it is accumulated context. Every tool result stays in the conversation history. If Claude reads a 500-line file early in the session, those tokens are included in every subsequent API call for the rest of the session.
# This file read costs tokens once when it happens,
# but the result stays in context for every future message
Read file: src/components/dashboard.tsx (487 lines)
# If you send 20 more messages after this read,
# those 487 lines are re-sent 20 times as conversation historyThe compounding effect is significant. A session with 10 file reads averaging 200 lines each adds roughly 2,000 lines of context to every subsequent message. By message 20, you are sending those 2,000 lines for the 20th time.
MCP Tool Definitions
Each MCP server you connect adds its tool definitions to the system prompt. Tool definitions include the tool name, description, and parameter schema. A single MCP server with 10 tools might add 500-1000 tokens to every API call.
If you have five MCP servers connected, the tool definitions alone could account for 2,000-5,000 tokens per message. These tokens are invisible -- you do not see them in the conversation -- but they are billed on every turn.
- Audit your connected MCP servers with
claude mcp list - Remove MCP servers you are not actively using
- Prefer MCP servers with fewer, well-scoped tools over servers with large tool catalogs
Strategies for Reducing Token Usage
Once you understand where tokens go, you can optimize:
Start new sessions often. Long sessions accumulate context. If you are switching tasks, start a fresh session instead of continuing in the same one. A new session clears the conversation history and starts with just the system prompt.
Be specific about file reads. Instead of asking Claude to read an entire file, point it to specific line ranges. Reading lines 50-80 of a file costs a fraction of reading all 500 lines.
# Instead of: "Read src/lib/auth.ts"
# Be specific: "Read lines 45-70 of src/lib/auth.ts"
# This reduces the tokens added to context by 80-90%Minimize MCP servers. Only connect the MCP servers you need for the current task. Each server's tool definitions add to every message's token count.
Use compact prompts. Long, detailed prompts cost more tokens but do not always produce better results. A clear, concise prompt often outperforms a verbose one while costing less.
Measuring Your Token Usage
Claude Code shows token usage at the end of each session. Pay attention to the input vs output token ratio. If input tokens dramatically exceed output tokens, your context is bloated -- the model is reading far more than it is writing.
A healthy ratio for a coding session is roughly 3:1 to 5:1 input to output. Ratios above 10:1 suggest excessive file reads, too many MCP tools, or a session that has run too long without a reset.
- Track token usage across sessions to establish your baseline
- Compare usage between similar tasks to identify inefficient patterns
- Set a mental budget per task and start a new session if you exceed it
The Bottom Line
Token costs in Claude Code are dominated by context accumulation, not by your prompts. The system prompt, MCP tool definitions, file read results, and conversation history compound with every message. Keep sessions short, be surgical with file reads, prune unused MCP servers, and start fresh sessions when switching tasks. These habits can reduce your token usage by 50% or more without changing the quality of Claude's output.