AI Agent Context Handoff Problem
Agents waste tokens re-explaining context between steps. Structured search results as JSON fit in context windows better than prose summaries.
AI agents waste tokens re-explaining context between steps. When a research agent searches, analyzes, and then searches again, it often re-describes previous findings in the prompt, consuming context window space. Structured search results (JSON with typed fields) compress better than prose summaries, leaving more room for actual reasoning.
The problem: context bloat between steps
A multi-step agent research task works like this: Step 1 searches for "best CRM tools." Step 2 analyzes the results. Step 3 searches for pricing details. But Step 3's prompt includes all of Step 1's results and Step 2's analysis as context. By Step 5, the context window is dominated by accumulated search results, not reasoning.
- Tavily AI-summarized results: ~200-500 tokens per search (paragraphs of prose)
- Structured JSON results: ~80-150 tokens per search (compact key-value pairs)
- Over 50 searches: difference of 5,000-17,500 tokens in context window usage
Structured results: smaller context footprint
import requests, os, json
H = {"x-api-key": os.environ["SCAVIO_API_KEY"]}
def compact_search(query: str):
"""Return minimal structured results that compress well in context."""
resp = requests.post("https://api.scavio.dev/api/v1/search",
headers=H, json={"query": query, "platform": "google"})
results = resp.json().get("organic_results", [])[:5]
# Compact format: only the fields the agent needs
return [{
"t": r.get("title", "")[:60],
"u": r.get("link", ""),
"s": r.get("snippet", "")[:100],
} for r in results]
# Compact results use ~50% fewer tokens than full results
results = compact_search("best CRM for startups 2026")
print(json.dumps(results, indent=2))The re-search vs pass-through tradeoff
When an agent needs information from a previous step, it has two options: pass the previous results forward in context (costs tokens but saves API calls) or re-search for the information (costs money but saves tokens). The optimal choice depends on your cost structure.
def should_research_or_reuse(
previous_results: list,
tokens_per_result: int = 100,
cost_per_search: float = 0.005,
cost_per_1k_tokens: float = 0.003 # typical input token cost
):
"""Decide whether to re-search or pass previous results in context."""
context_cost = len(previous_results) * tokens_per_result * cost_per_1k_tokens / 1000
search_cost = cost_per_search
if search_cost < context_cost:
return "re-search (cheaper than carrying context forward)"
return "reuse (carrying context is cheaper than re-searching)"
# 10 previous results at 100 tokens each, $0.003/1K input tokens
# Context cost: 10 * 100 * 0.003 / 1000 = $0.003
# Search cost: $0.005
# Verdict: reuse the context (cheaper)
print(should_research_or_reuse([{}] * 10))
# 50 previous results
# Context cost: 50 * 100 * 0.003 / 1000 = $0.015
# Search cost: $0.005
# Verdict: re-search (cheaper)
print(should_research_or_reuse([{}] * 50))Design patterns for context-efficient agents
- Summarize-and-discard: after analyzing search results, keep only the summary, drop raw results
- Reference-by-ID: store results externally, pass only IDs in context, retrieve when needed
- Structured over prose: JSON results compress better than AI-generated summaries
- Budget-aware handoff: calculate whether re-searching or carrying context is cheaper
The practical impact
For a 10-step research agent making 5 searches per step, the context management strategy can mean the difference between fitting in a 128K context window and hitting the limit at step 7. Structured results buy you 2-3 more steps before context overflow. That extra headroom often means completing the task vs returning partial results.