llmsearch-apicontext-window

Structured Search vs Raw HTML for LLM Context

Raw HTML breaks token limits. Over-stripped snippets lose meaning. Structured JSON sits in the middle and wins for agents.

5 min read

An r/n8n thread laid out the LLM-pipeline data quality problem cleanly: "Most search APIs either return raw HTML which breaks token limits, or strip out too much context and lose meaning." Both failures kill LLM pipelines in different ways. Structured JSON sits in the middle and is the right answer for most agent workflows.

Why raw HTML breaks token limits

A single Google SERP page can run 200K+ tokens of HTML when you include the full markup. Even after a basic strip, the rendered text is 10K-30K tokens — 5x the context the LLM actually needs to answer the user's question. The agent either truncates and loses signal, or it doesn't and pays for the bloat.

Why over-stripped snippets lose meaning

Some APIs aggressively summarize. The LLM sees "React is a JavaScript library for building user interfaces" instead of the specific snippet that answers the user's actual question. The agent has the context window space, but the retrieval gave up too much detail.

Structured JSON is the middle

Title, snippet, link, position, plus optional structured fields (AI Overview citations, knowledge graph, related questions). 100 tokens per result. 10 results = 1,000 tokens — fits in any context window with room to spare. The LLM picks which 1-2 results to read fully via the extract endpoint.

Python
import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']
H = {'x-api-key': API_KEY}

def llm_friendly_search(q, k=5):
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers=H, json={'query': q}).json()
    # Trim to fields the LLM cares about. ~80 tokens per result.
    return [{
        'title': r['title'],
        'snippet': r['snippet'],
        'url': r['link']
    } for r in r.get('organic_results', [])[:k]]

The fan-out + fetch pattern

Search returns 10 trimmed snippets. The LLM picks the top 1-2 to read fully. The extract endpoint returns markdown for those pages. Total context: ~10K tokens — generous for the model, cheap on input cost, and exactly the data shape the LLM needs.

Why this beats both extremes

Raw HTML is 10x the cost for 1.2x the value. Over-stripped snippets save tokens but force the LLM to guess. Structured JSON captures the breadth of search results at predictable token cost and only fans out for depth where the model decides depth matters.

Token math at production scale

Agent answering 1,000 questions/day. Raw HTML approach: 30K tokens of context per question = 30M input tokens/day = $90/day on Claude Sonnet 4.6 at current input pricing. Structured JSON approach with selective fan-out: 8K tokens average context = 8M input tokens/day = $24/day. The difference at scale is real — $66/day or ~$2K/mo.

What to look for in a search API for LLM pipelines

Typed JSON output (not HTML, not over-summarized). Separate extract endpoint for full reads on demand. Multi-platform shape if the agent needs Reddit or YouTube context. MCP server if the agent runs in Claude Desktop, Cursor, or opencode. Lower than $0.005/query at the entry tier.

Why Scavio fits this pattern

Search returns typed JSON across SERP, Reddit, YouTube, Amazon, and Walmart at $0.0043/query (Project tier). Extract returns markdown at the same per-query cost. Both endpoints share one x-api-key. MCP server at mcp.scavio.dev/mcp works in every major agent IDE. The free tier (500 credits/mo) is enough to validate the pattern end to end before billing.