Definition
An agent token budget is a programmatic limit on how many context tokens an AI agent allocates to tool call results (particularly search results) per session or per turn, preventing uncontrolled context growth that degrades reasoning quality and increases costs.
In Depth
Without token budgets, a single search API call can inject 2000-5000 tokens of results into an agent's context. An agent making 5 searches per session might consume 10,000-25,000 tokens on search results alone, leaving less context for reasoning, code generation, and conversation history. Token budgets work at two levels: per-call budgets that truncate individual search results (e.g., max 300 tokens per search, keeping only title + snippet + URL for top 5 results) and session budgets that limit total search token consumption. Structured search APIs like Scavio return compact JSON (title, snippet, URL) that is inherently more token-efficient than raw HTML or full-page extraction. A typical Scavio result for 10 organic results uses 600-800 tokens versus 4000-8000 tokens for equivalent raw web content. Implementing budgets: count tokens in search results using tiktoken (Python) or approximation (chars/4), truncate at the budget threshold, and track cumulative usage per session.
Example Usage
An agent developer sets a 2000-token budget for search context per session. Each Scavio search returns ~150 tokens of structured results (5 results, title + snippet). The agent makes 8 searches using 1200 tokens, well within budget. Without the budget, the same 8 searches using raw web fetch would have consumed 12,000 tokens.
Platforms
Agent Token Budget is relevant across the following platforms, all accessible through Scavio's unified API:
Related Terms
Context Bloat
Context bloat is the accumulation of tokens in an LLM's context window before the user has asked anything — usually from...
Credit-Based API Pricing
Credit-based API pricing is a billing model where API consumers purchase a pool of credits that are deducted based on us...
MCP Web Content Extraction
MCP web content extraction is the process of using an MCP server to fetch web pages and convert them to clean Markdown o...