agentscostoptimization

Agent Search Token Cost Optimization Guide

Reduce LLM agent token costs by 60-80% with structured search extraction. Practical patterns for LangChain, CrewAI, and LangGraph agent frameworks.

May 10, 2026

6 min read

AI agent search costs are dominated by token consumption, not API fees. A single web search returning raw HTML can inject 20,000-25,000 tokens into an agent's context window. Structured search APIs returning typed JSON reduce this to 800-1,200 tokens per search, cutting LLM costs by 90% or more.

Where Tokens Actually Go

When an agent searches the web using a raw HTML approach, the full page content gets injected into the context window: navigation menus, footers, ads, scripts, and the actual content. A typical Google results page is 150KB+ of HTML, which tokenizes to 20,000-30,000 tokens. At Claude Sonnet pricing, that is roughly $0.06-0.09 per search in input tokens alone.

A structured search API returns only the data you need: titles, URLs, snippets, and metadata as JSON. The same search result set tokenizes to 800-1,200 tokens. At $0.003 per 1K input tokens, that is under $0.004 per search in LLM costs.

Token Optimization Strategies

Python

import requests, os

API_KEY = os.environ["SCAVIO_API_KEY"]

def efficient_search(query, max_results=5):
    """Return minimal context for LLM consumption."""
    resp = requests.post("https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": API_KEY},
        json={"platform": "google", "query": query})
    results = resp.json().get("organic_results", [])[:max_results]
    # Strip to essential fields only
    return [{"title": r["title"],
             "snippet": r.get("snippet", ""),
             "url": r.get("link", "")}
            for r in results]

# ~200 tokens vs ~20,000 from raw HTML
context = efficient_search("best database for analytics 2026")

Result Filtering Before Context Injection

Not every search result belongs in the agent's context. Filter results before injection to reduce token waste. Remove results from irrelevant domains, deduplicate similar snippets, and truncate snippets to the first 150 characters.

Caching for Repeated Queries

Agents often search the same or similar queries within a session. A simple in-memory cache with a 1-hour TTL eliminates redundant API calls and token costs. For a research agent processing 50 queries, caching typically reduces unique searches to 30-35 after deduplication.

Cost Math at Scale

An agent making 100 searches per day:

Raw HTML approach: 100 x 25,000 tokens = 2.5M input tokens = $7.50/day in LLM costs + scraping infrastructure
Structured API: 100 x 1,000 tokens = 100K input tokens = $0.30/day + $0.50/day API = $0.80/day total
With caching: 70 unique searches x 1,000 tokens = 70K tokens = $0.21/day + $0.35/day API = $0.56/day total

Monthly: $225/month (raw) vs $24/month (structured) vs $17/month (structured + cached). The API cost ($0.005/query) is negligible compared to the LLM token savings.

Agent Search Token Cost Optimization Guide

Where Tokens Actually Go

Token Optimization Strategies

Result Filtering Before Context Injection

Caching for Repeated Queries

Cost Math at Scale

Continue reading

Connect Scavio to Any AI Assistant with MCP

Build a Cross-Platform Product Research Agent with LangGraph