Where Do Claude Code Tokens Actually Go?

Claude Code sessions can burn through tokens fast. A 30-minute coding session might consume hundreds of thousands of tokens, and the bill adds up. But where do those tokens actually go? Most developers assume it is their prompts, but the reality is more nuanced. Understanding token distribution helps you write more efficient prompts and structure your workflow to minimize waste.

The Token Budget Breakdown

A typical Claude Code session spends tokens across four categories:

System prompt and context: The system prompt, CLAUDE.md files, MCP tool definitions, and other injected context. This is sent with every message and can be surprisingly large.
Conversation history: Every previous message in the session -- your prompts and Claude's responses -- accumulates in the context window. Later messages in a long session carry the full weight of earlier ones.
Tool calls and results: File reads, searches, grep results, and other tool outputs. A single file read of a large file can inject thousands of tokens.
Model output: Claude's responses, including code it writes, explanations, and thinking. Output tokens are typically more expensive than input tokens.

Where the Waste Hides

The biggest source of token waste is not your prompts -- it is accumulated context. Every tool result stays in the conversation history. If Claude reads a 500-line file early in the session, those tokens are included in every subsequent API call for the rest of the session.

Bash

# This file read costs tokens once when it happens,
# but the result stays in context for every future message
Read file: src/components/dashboard.tsx (487 lines)

# If you send 20 more messages after this read,
# those 487 lines are re-sent 20 times as conversation history

The compounding effect is significant. A session with 10 file reads averaging 200 lines each adds roughly 2,000 lines of context to every subsequent message. By message 20, you are sending those 2,000 lines for the 20th time.

MCP Tool Definitions

Each MCP server you connect adds its tool definitions to the system prompt. Tool definitions include the tool name, description, and parameter schema. A single MCP server with 10 tools might add 500-1000 tokens to every API call.

If you have five MCP servers connected, the tool definitions alone could account for 2,000-5,000 tokens per message. These tokens are invisible -- you do not see them in the conversation -- but they are billed on every turn.

Audit your connected MCP servers with claude mcp list
Remove MCP servers you are not actively using
Prefer MCP servers with fewer, well-scoped tools over servers with large tool catalogs

Strategies for Reducing Token Usage

Once you understand where tokens go, you can optimize:

Start new sessions often. Long sessions accumulate context. If you are switching tasks, start a fresh session instead of continuing in the same one. A new session clears the conversation history and starts with just the system prompt.

Be specific about file reads. Instead of asking Claude to read an entire file, point it to specific line ranges. Reading lines 50-80 of a file costs a fraction of reading all 500 lines.

Bash

# Instead of: "Read src/lib/auth.ts"
# Be specific: "Read lines 45-70 of src/lib/auth.ts"
# This reduces the tokens added to context by 80-90%

Minimize MCP servers. Only connect the MCP servers you need for the current task. Each server's tool definitions add to every message's token count.

Use compact prompts. Long, detailed prompts cost more tokens but do not always produce better results. A clear, concise prompt often outperforms a verbose one while costing less.

Measuring Your Token Usage

Claude Code shows token usage at the end of each session. Pay attention to the input vs output token ratio. If input tokens dramatically exceed output tokens, your context is bloated -- the model is reading far more than it is writing.

A healthy ratio for a coding session is roughly 3:1 to 5:1 input to output. Ratios above 10:1 suggest excessive file reads, too many MCP tools, or a session that has run too long without a reset.

Track token usage across sessions to establish your baseline
Compare usage between similar tasks to identify inefficient patterns
Set a mental budget per task and start a new session if you exceed it

The Bottom Line

Token costs in Claude Code are dominated by context accumulation, not by your prompts. The system prompt, MCP tool definitions, file read results, and conversation history compound with every message. Keep sessions short, be surgical with file reads, prune unused MCP servers, and start fresh sessions when switching tasks. These habits can reduce your token usage by 50% or more without changing the quality of Claude's output.

The Token Budget Breakdown

A typical Claude Code session spends tokens across four categories:

System prompt and context: The system prompt, CLAUDE.md files, MCP tool definitions, and other injected context. This is sent with every message and can be surprisingly large.
Conversation history: Every previous message in the session -- your prompts and Claude's responses -- accumulates in the context window. Later messages in a long session carry the full weight of earlier ones.
Tool calls and results: File reads, searches, grep results, and other tool outputs. A single file read of a large file can inject thousands of tokens.
Model output: Claude's responses, including code it writes, explanations, and thinking. Output tokens are typically more expensive than input tokens.

Where the Waste Hides

Bash

# This file read costs tokens once when it happens,
# but the result stays in context for every future message
Read file: src/components/dashboard.tsx (487 lines)

# If you send 20 more messages after this read,
# those 487 lines are re-sent 20 times as conversation history

MCP Tool Definitions

Audit your connected MCP servers with claude mcp list
Remove MCP servers you are not actively using
Prefer MCP servers with fewer, well-scoped tools over servers with large tool catalogs

Strategies for Reducing Token Usage

Once you understand where tokens go, you can optimize:

Be specific about file reads. Instead of asking Claude to read an entire file, point it to specific line ranges. Reading lines 50-80 of a file costs a fraction of reading all 500 lines.

Bash

# Instead of: "Read src/lib/auth.ts"
# Be specific: "Read lines 45-70 of src/lib/auth.ts"
# This reduces the tokens added to context by 80-90%

Minimize MCP servers. Only connect the MCP servers you need for the current task. Each server's tool definitions add to every message's token count.

Use compact prompts. Long, detailed prompts cost more tokens but do not always produce better results. A clear, concise prompt often outperforms a verbose one while costing less.

Measuring Your Token Usage

A healthy ratio for a coding session is roughly 3:1 to 5:1 input to output. Ratios above 10:1 suggest excessive file reads, too many MCP tools, or a session that has run too long without a reset.

Track token usage across sessions to establish your baseline
Compare usage between similar tasks to identify inefficient patterns
Set a mental budget per task and start a new session if you exceed it

Where Do Claude Code Tokens Actually Go?

The Token Budget Breakdown

Where the Waste Hides

MCP Tool Definitions

Strategies for Reducing Token Usage

Measuring Your Token Usage

The Bottom Line

Continue reading

AEO Tracking for D2C Ecommerce Brands in 2026

Agent Discovery vs Extraction: Why Cost Split Matters

Where Do Claude Code Tokens Actually Go?

The Token Budget Breakdown

Where the Waste Hides

MCP Tool Definitions

Strategies for Reducing Token Usage

Measuring Your Token Usage

The Bottom Line

Continue reading

AEO Tracking for D2C Ecommerce Brands in 2026

Agent Discovery vs Extraction: Why Cost Split Matters