Agent Search Token Cost Optimization Guide
Reduce LLM agent token costs by 60-80% with structured search extraction. Practical patterns for LangChain, CrewAI, and LangGraph agent frameworks.
AI agent search costs are dominated by token consumption, not API fees. A single web search returning raw HTML can inject 20,000-25,000 tokens into an agent's context window. Structured search APIs returning typed JSON reduce this to 800-1,200 tokens per search, cutting LLM costs by 90% or more.
Where Tokens Actually Go
When an agent searches the web using a raw HTML approach, the full page content gets injected into the context window: navigation menus, footers, ads, scripts, and the actual content. A typical Google results page is 150KB+ of HTML, which tokenizes to 20,000-30,000 tokens. At Claude Sonnet pricing, that is roughly $0.06-0.09 per search in input tokens alone.
A structured search API returns only the data you need: titles, URLs, snippets, and metadata as JSON. The same search result set tokenizes to 800-1,200 tokens. At $0.003 per 1K input tokens, that is under $0.004 per search in LLM costs.
Token Optimization Strategies
import requests, os
API_KEY = os.environ["SCAVIO_API_KEY"]
def efficient_search(query, max_results=5):
"""Return minimal context for LLM consumption."""
resp = requests.post("https://api.scavio.dev/api/v1/search",
headers={"x-api-key": API_KEY},
json={"platform": "google", "query": query})
results = resp.json().get("organic_results", [])[:max_results]
# Strip to essential fields only
return [{"title": r["title"],
"snippet": r.get("snippet", ""),
"url": r.get("link", "")}
for r in results]
# ~200 tokens vs ~20,000 from raw HTML
context = efficient_search("best database for analytics 2026")
Result Filtering Before Context Injection
Not every search result belongs in the agent's context. Filter results before injection to reduce token waste. Remove results from irrelevant domains, deduplicate similar snippets, and truncate snippets to the first 150 characters.
Caching for Repeated Queries
Agents often search the same or similar queries within a session. A simple in-memory cache with a 1-hour TTL eliminates redundant API calls and token costs. For a research agent processing 50 queries, caching typically reduces unique searches to 30-35 after deduplication.
Cost Math at Scale
An agent making 100 searches per day:
- Raw HTML approach: 100 x 25,000 tokens = 2.5M input tokens = $7.50/day in LLM costs + scraping infrastructure
- Structured API: 100 x 1,000 tokens = 100K input tokens = $0.30/day + $0.50/day API = $0.80/day total
- With caching: 70 unique searches x 1,000 tokens = 70K tokens = $0.21/day + $0.35/day API = $0.56/day total
Monthly: $225/month (raw) vs $24/month (structured) vs $17/month (structured + cached). The API cost ($0.005/query) is negligible compared to the LLM token savings.