agentstoken-optimizationcost

Reduce Agent Search Tokens with Structured JSON

LLM agents waste 60-80% of token budget on unstructured search results. Structured JSON extraction cuts tokens from 4,000 to under 800 per search call.

May 10, 2026

5 min read

LLM agents waste 60-80% of their token budget on unstructured search results. A raw Google SERP response contains HTML metadata, navigation elements, and duplicate content that the agent never uses. Structured JSON extraction reduces a typical search response from 4,000 tokens to under 800 tokens while preserving the information the agent actually needs.

Where Tokens Go

A ReAct-style agent calling a web search tool receives the full search response in its context window. With GPT-4o at $2.50/million input tokens and Claude Sonnet at $3/million, each search call costs $0.01-0.012 in tokens alone. An agent making 10 searches per task spends $0.10-0.12 just on search context, often more than the search API itself costs.

The Structured Extraction Pattern

Python

import requests, os, json

API_KEY = os.environ["SCAVIO_API_KEY"]

def compact_search(query, max_results=3, snippet_chars=120):
    """Search and return only what the agent needs."""
    resp = requests.post("https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": API_KEY},
        json={"platform": "google", "query": query})
    data = resp.json()
    compact = []
    for r in data.get("organic", [])[:max_results]:
        compact.append({
            "t": r.get("title", "")[:80],
            "s": r.get("snippet", "")[:snippet_chars],
            "u": r.get("link", ""),
        })
    return json.dumps(compact)

# Full response: ~4000 tokens. Compact: ~300 tokens.
result = compact_search("LangGraph tool calling best practices 2026")
print(f"Compact result: {len(result)} chars")
print(result)

Token Savings by Strategy

Limit to 3 results instead of 10: saves 70% of organic result tokens with minimal information loss for most agent tasks.
Truncate snippets to 120 characters: captures the key information from each result while cutting snippet tokens by 60%.
Use short field names (t/s/u instead of title/snippet/url): saves 5-10% on JSON overhead. Small but free.
Strip People Also Ask and related searches: these are useful for SEO research agents but waste tokens for coding or research agents.

When to Keep Full Results

SEO analysis agents need full SERP data including People Also Ask, featured snippets, and AI Overviews. Research agents benefit from longer snippets when synthesizing information across sources. The compact pattern works best for coding agents (verifying API docs), fact-checking agents (confirming a specific claim), and monitoring agents (checking if a page exists).

Integration with Agent Frameworks

In LangChain, wrap the compact search as a custom tool. In CrewAI, use it as a crew tool with the description explaining the truncation. In LangGraph, add it as a node tool. The agent does not need to know the results are compressed; it adapts naturally to shorter context.

Reduce Agent Search Tokens with Structured JSON

Where Tokens Go

The Structured Extraction Pattern

Token Savings by Strategy

When to Keep Full Results

Integration with Agent Frameworks

Continue reading

Connect Scavio to Any AI Assistant with MCP

Build a Cross-Platform Product Research Agent with LangGraph