search-apicostcontext-window

Agent Search Cost Per Context Window Fill

Real cost of search is API fee plus LLM context tokens. 10 results at 200 tokens each adds $0.005 in LLM cost per query.

May 14, 2026

7 min

The real cost of search for AI agents is not the per-query price -- it is the cost of filling the context window with search results. A single search returning 10 results at ~200 tokens each consumes 2,000 tokens. At GPT-4o rates ($2.50/1M input tokens), that is $0.005 in LLM cost on top of the search API cost.

The math: search cost + context cost

Every search result injected into context has two costs: the API call and the LLM tokens to process it. Most developers only track the API cost and miss the larger LLM cost.

Python

# True cost per search-grounded query
search_api_cost = 0.005  # Scavio per-query cost
results_count = 10
tokens_per_result = 200  # title + snippet + URL
total_search_tokens = results_count * tokens_per_result  # 2,000

# LLM input token costs (per 1M tokens)
llm_costs = {
    "GPT-4o": 2.50,
    "Claude Sonnet 4": 3.00,
    "Claude Opus 4": 15.00,
    "GPT-4.1": 2.00,
    "Llama 3.3 (local)": 0.00,
}

for model, cost_per_m in llm_costs.items():
    llm_cost = (total_search_tokens / 1_000_000) * cost_per_m
    total = search_api_cost + llm_cost
    print(f"{model}: ${total:.4f}/query (API: ${search_api_cost}, LLM: ${llm_cost:.4f})")

Optimizing result count

Most agents request 10 results by default. For most queries, 3-5 results provide sufficient grounding. Reducing from 10 to 5 results cuts context tokens in half while maintaining answer quality for 80%+ of queries.

Python

import requests, os

# Optimized: request only what you need
def efficient_search(query, max_results=5):
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": os.environ["SCAVIO_API_KEY"]},
        json={
            "query": query,
            "num_results": max_results,
        },
    )
    results = resp.json().get("organic_results", [])
    # Return only title + snippet (skip URL to save tokens)
    return [
        {"title": r["title"], "snippet": r["snippet"]}
        for r in results
    ]

Context window efficiency strategies

Request 5 results instead of 10 for most queries
Strip URLs from results unless the agent needs to cite sources
Summarize long snippets before injecting into context
Use AI Overview data when available (pre-synthesized, fewer tokens)
Cache and deduplicate results across agent turns

Multi-turn agent cost analysis

Agents that search multiple times per task accumulate context rapidly. A research agent making 5 searches per task with 10 results each adds 10,000 tokens of search context.

Python

# Multi-turn agent cost breakdown
searches_per_task = 5
results_per_search = 10
tokens_per_result = 200
search_api_rate = 0.005
llm_rate_per_m = 3.00  # Claude Sonnet 4

total_search_tokens = searches_per_task * results_per_search * tokens_per_result
api_cost = searches_per_task * search_api_rate
llm_cost = (total_search_tokens / 1_000_000) * llm_rate_per_m
total_per_task = api_cost + llm_cost

print(f"Tokens from search: {total_search_tokens:,}")
print(f"API cost: ${api_cost:.3f}")
print(f"LLM cost: ${llm_cost:.4f}")
print(f"Total per task: ${total_per_task:.3f}")
print(f"Cost for 1000 tasks/day: ${total_per_task * 1000:.2f}")

# Optimized: 5 results instead of 10
opt_tokens = searches_per_task * 5 * tokens_per_result
opt_llm = (opt_tokens / 1_000_000) * llm_rate_per_m
opt_total = api_cost + opt_llm
print(f"Optimized total per task: ${opt_total:.3f}")
print(f"Savings: {(1 - opt_total/total_per_task) * 100:.0f}%")

When to use full page extraction vs snippets

Full page extraction (Tavily extract, Scavio scrape) returns 1,000-5,000 tokens per page. Use it only when snippets are insufficient -- for example, extracting specific pricing tables or technical specifications. For general grounding, snippets are 20x cheaper in context cost.

Bottom line

Track your total search cost as API fees plus LLM context cost. For most agents, the LLM cost of processing search results equals or exceeds the search API cost itself. Optimize result count first, then consider snippet compression and caching.

Agent Search Cost Per Context Window Fill

The math: search cost + context cost

Optimizing result count

Context window efficiency strategies

Multi-turn agent cost analysis

When to use full page extraction vs snippets

Bottom line

Continue reading

Connect Scavio to Any AI Assistant with MCP

Build a Cross-Platform Product Research Agent with LangGraph