Agent Search Cost Per Context Window Fill
Real cost of search is API fee plus LLM context tokens. 10 results at 200 tokens each adds $0.005 in LLM cost per query.
The real cost of search for AI agents is not the per-query price -- it is the cost of filling the context window with search results. A single search returning 10 results at ~200 tokens each consumes 2,000 tokens. At GPT-4o rates ($2.50/1M input tokens), that is $0.005 in LLM cost on top of the search API cost.
The math: search cost + context cost
Every search result injected into context has two costs: the API call and the LLM tokens to process it. Most developers only track the API cost and miss the larger LLM cost.
# True cost per search-grounded query
search_api_cost = 0.005 # Scavio per-query cost
results_count = 10
tokens_per_result = 200 # title + snippet + URL
total_search_tokens = results_count * tokens_per_result # 2,000
# LLM input token costs (per 1M tokens)
llm_costs = {
"GPT-4o": 2.50,
"Claude Sonnet 4": 3.00,
"Claude Opus 4": 15.00,
"GPT-4.1": 2.00,
"Llama 3.3 (local)": 0.00,
}
for model, cost_per_m in llm_costs.items():
llm_cost = (total_search_tokens / 1_000_000) * cost_per_m
total = search_api_cost + llm_cost
print(f"{model}: ${total:.4f}/query (API: ${search_api_cost}, LLM: ${llm_cost:.4f})")Optimizing result count
Most agents request 10 results by default. For most queries, 3-5 results provide sufficient grounding. Reducing from 10 to 5 results cuts context tokens in half while maintaining answer quality for 80%+ of queries.
import requests, os
# Optimized: request only what you need
def efficient_search(query, max_results=5):
resp = requests.post(
"https://api.scavio.dev/api/v1/search",
headers={"x-api-key": os.environ["SCAVIO_API_KEY"]},
json={
"query": query,
"num_results": max_results,
},
)
results = resp.json().get("organic_results", [])
# Return only title + snippet (skip URL to save tokens)
return [
{"title": r["title"], "snippet": r["snippet"]}
for r in results
]Context window efficiency strategies
- Request 5 results instead of 10 for most queries
- Strip URLs from results unless the agent needs to cite sources
- Summarize long snippets before injecting into context
- Use AI Overview data when available (pre-synthesized, fewer tokens)
- Cache and deduplicate results across agent turns
Multi-turn agent cost analysis
Agents that search multiple times per task accumulate context rapidly. A research agent making 5 searches per task with 10 results each adds 10,000 tokens of search context.
# Multi-turn agent cost breakdown
searches_per_task = 5
results_per_search = 10
tokens_per_result = 200
search_api_rate = 0.005
llm_rate_per_m = 3.00 # Claude Sonnet 4
total_search_tokens = searches_per_task * results_per_search * tokens_per_result
api_cost = searches_per_task * search_api_rate
llm_cost = (total_search_tokens / 1_000_000) * llm_rate_per_m
total_per_task = api_cost + llm_cost
print(f"Tokens from search: {total_search_tokens:,}")
print(f"API cost: ${api_cost:.3f}")
print(f"LLM cost: ${llm_cost:.4f}")
print(f"Total per task: ${total_per_task:.3f}")
print(f"Cost for 1000 tasks/day: ${total_per_task * 1000:.2f}")
# Optimized: 5 results instead of 10
opt_tokens = searches_per_task * 5 * tokens_per_result
opt_llm = (opt_tokens / 1_000_000) * llm_rate_per_m
opt_total = api_cost + opt_llm
print(f"Optimized total per task: ${opt_total:.3f}")
print(f"Savings: {(1 - opt_total/total_per_task) * 100:.0f}%")When to use full page extraction vs snippets
Full page extraction (Tavily extract, Scavio scrape) returns 1,000-5,000 tokens per page. Use it only when snippets are insufficient -- for example, extracting specific pricing tables or technical specifications. For general grounding, snippets are 20x cheaper in context cost.
Bottom line
Track your total search cost as API fees plus LLM context cost. For most agents, the LLM cost of processing search results equals or exceeds the search API cost itself. Optimize result count first, then consider snippet compression and caching.