Tutorial

How to Reduce Agent Search Token Count

Compress web search results before passing them to LLM agents. Cut token usage by 60-80% while preserving the information agents need to answer.

LLM agents that call web search tools often consume excessive tokens because raw search results contain titles, snippets, URLs, metadata, and SERP features that the agent does not need. Passing full search responses into an agent context window wastes tokens and money. This tutorial shows how to compress search results by extracting only the fields the agent needs, truncating snippets, deduplicating content, and formatting results as compact text. You will build a search compression layer that reduces token count by 60-80% while keeping the information density high.

Prerequisites

  • Python 3.8+ installed
  • requests library installed
  • A Scavio API key from scavio.dev
  • An LLM agent that uses search tools

Walkthrough

Step 1: Fetch raw search results

Query the Scavio API and measure the raw token count of the full response.

Python
import os, requests, json

API_KEY = os.environ["SCAVIO_API_KEY"]
resp = requests.post("https://api.scavio.dev/api/v1/search",
    headers={"x-api-key": API_KEY},
    json={"platform": "google", "query": "best CRM for startups 2026"})
raw = resp.json()
raw_size = len(json.dumps(raw))
print(f"Raw response: {raw_size} chars")

Step 2: Extract essential fields only

Strip the response down to only the fields an agent needs: title, snippet, and URL.

Python
def compress_results(data, max_results=5):
    results = []
    for r in data.get("organic_results", [])[:max_results]:
        results.append({
            "title": r.get("title", "")[:80],
            "snippet": r.get("snippet", "")[:200],
            "url": r.get("link", ""),
        })
    return results

compressed = compress_results(raw)
comp_size = len(json.dumps(compressed))
print(f"Compressed: {comp_size} chars ({100 - round(comp_size/raw_size*100)}% reduction)")

Step 3: Format as compact text for agent context

Convert structured results to a minimal text format that uses fewer tokens than JSON.

Python
def format_for_agent(results):
    lines = []
    for i, r in enumerate(results, 1):
        lines.append(f"[{i}] {r['title']}")
        lines.append(f"    {r['snippet']}")
        lines.append(f"    {r['url']}")
    return "\n".join(lines)

agent_text = format_for_agent(compressed)
print(f"Agent text: {len(agent_text)} chars")
print(agent_text[:500])

Step 4: Deduplicate overlapping results

Remove near-duplicate results that waste agent context with redundant information.

Python
def deduplicate(results):
    seen_domains = set()
    unique = []
    for r in results:
        from urllib.parse import urlparse
        domain = urlparse(r["url"]).netloc
        if domain not in seen_domains:
            seen_domains.add(domain)
            unique.append(r)
    return unique

deduped = deduplicate(compressed)
print(f"After dedup: {len(deduped)} results (was {len(compressed)})")

Step 5: Build the compression wrapper

Combine all compression steps into a single function that replaces the raw search call in your agent.

Python
def agent_search(query, max_results=5):
    resp = requests.post("https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": API_KEY},
        json={"platform": "google", "query": query})
    compressed = compress_results(resp.json(), max_results)
    deduped = deduplicate(compressed)
    return format_for_agent(deduped)

result = agent_search("best CRM for startups 2026")
print(f"Final token-efficient output: {len(result)} chars")

Python Example

Python
import os, requests, json
API_KEY = os.environ["SCAVIO_API_KEY"]
resp = requests.post("https://api.scavio.dev/api/v1/search",
    headers={"x-api-key": API_KEY},
    json={"platform": "google", "query": "best CRM for startups 2026"})
results = resp.json().get("organic_results", [])[:5]
for r in results:
    print(f"{r['title'][:80]}\n  {r.get('snippet', '')[:150]}")

JavaScript Example

JavaScript
const r = await fetch("https://api.scavio.dev/api/v1/search", {
  method: "POST",
  headers: {"x-api-key": process.env.SCAVIO_API_KEY, "Content-Type": "application/json"},
  body: JSON.stringify({platform: "google", query: "best CRM for startups 2026"})
});
const data = await r.json();
(data.organic_results || []).slice(0, 5).forEach(r =>
  console.log(r.title.slice(0, 80), "\n ", (r.snippet || "").slice(0, 150))
);

Expected Output

JSON
A compressed text representation of search results that uses 60-80% fewer tokens than the raw JSON response while preserving all information an agent needs.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.8+ installed. requests library installed. A Scavio API key from scavio.dev. An LLM agent that uses search tools. A Scavio API key gives you 250 free credits per month.

Yes. The free tier includes 250 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Compress web search results before passing them to LLM agents. Cut token usage by 60-80% while preserving the information agents need to answer.