Definition
Agent token optimization is the practice of minimizing the number of tokens consumed when an LLM agent processes web search results, reducing both latency and API cost per search-grounded response.
In Depth
When an AI agent calls a web search tool, the search results are injected into the LLM's context window as tokens. A typical Google SERP page returns 10 organic results with titles, URLs, and snippets -- roughly 800-1,200 tokens. Adding People Also Ask, knowledge graph, and AI Overview data can push this to 2,000-3,000 tokens per search call. At Claude Sonnet pricing, each search result injection costs $0.009-$0.027 in input tokens alone, on top of the search API cost ($0.005/query on Scavio, $0.008/credit on Tavily). Optimization strategies: (1) request fewer results (5 instead of 10 saves ~500 tokens), (2) strip URLs and metadata the agent does not need, (3) use structured JSON fields instead of raw HTML snippets, (4) cache repeated queries to avoid redundant search calls. MCP tool schema bloat is another source: a search tool with 15 optional parameters adds ~200 tokens to every agent turn even when not called. Pruning the schema to essential parameters (query, platform, count) reduces this overhead. Teams running agents at scale report 30-40% token reduction from these optimizations, translating to measurable cost savings at 10K+ agent invocations per month.
Example Usage
A customer support agent makes 3 search calls per ticket resolution, processing 50 tickets/day. Before optimization: 2,500 tokens per search x 3 searches x 50 tickets = 375K input tokens/day ($1.13/day at Claude Sonnet rates). After optimization (5 results instead of 10, stripped metadata, schema pruning): 1,400 tokens per search x 3 x 50 = 210K tokens/day ($0.63/day). Monthly savings: $15 in LLM costs alone, plus faster response times.
Platforms
Agent Token Optimization is relevant across the following platforms, all accessible through Scavio's unified API: