Solution

Cut Context Window Size with Extraction MCP

Agents that fetch full web pages to answer questions waste 80-90% of context tokens on irrelevant content: cookie banners, navigation menus, related article links, ad scripts, and

The Problem

Agents that fetch full web pages to answer questions waste 80-90% of context tokens on irrelevant content: cookie banners, navigation menus, related article links, ad scripts, and footer boilerplate. A 12,000-token page might contain 1,200 tokens of useful information. With 128K context windows filling up across a multi-turn session, the agent either truncates important earlier context or hits the limit and loses coherence.

The Scavio Solution

Use Scavio's search API as an extraction layer via MCP. Instead of fetching and parsing entire pages, query for the specific information needed. The API returns structured fields (title, snippet, AI Overview) that contain the signal without the noise. For multi-source research, five structured results use fewer tokens than one raw page fetch.

Before

Before using extraction MCP, a research agent fetched three full pages per question, consuming 36,000 tokens of context per turn. After four turns, the 128K context window was 60% full of web page boilerplate. The agent started losing track of the original research question.

After

After switching to structured search via MCP, each question consumes 600 tokens of search context instead of 36,000. After four turns the context window is under 5% used by search data. The agent maintains coherence through 20-turn research sessions.

Who It Is For

Agent developers hitting context window limits in multi-turn sessions who need to reduce the token cost of web-sourced context.

Key Benefits

  • Reduce search context from 36,000 to 600 tokens per turn
  • Maintain agent coherence in long multi-turn sessions
  • Structured fields eliminate boilerplate extraction logic
  • MCP integration requires no code changes to the agent
  • Works with any MCP-compatible framework (Claude, LangChain, etc.)

Python Example

Python
import requests, os, json

H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def extract_facts(query: str, max_results: int = 5) -> str:
    """Return only the signal, skip the noise. ~100 tokens per result."""
    r = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': 'google', 'query': query, 'ai_overview': True}, timeout=10).json()
    parts = []
    aio = r.get('ai_overview')
    if aio:
        parts.append(f'AI Overview: {aio.get("text", "")[:300]}')
    for o in r.get('organic', [])[:max_results]:
        parts.append(f'{o.get("title")}: {o.get("snippet")} [{o.get("link")}]')
    context = '\n'.join(parts)
    print(f'Extraction: ~{len(context)//4} tokens vs ~12,000 for a full page')
    return context

print(extract_facts('scavio search api pricing 2026'))

JavaScript Example

JavaScript
const H = { 'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json' };

async function extractFacts(query, maxResults = 5) {
  const r = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: H,
    body: JSON.stringify({ platform: 'google', query, ai_overview: true })
  }).then(r => r.json());
  const parts = [];
  if (r.ai_overview) parts.push('AI Overview: ' + (r.ai_overview.text || '').slice(0, 300));
  for (const o of (r.organic || []).slice(0, maxResults)) {
    parts.push(`${o.title}: ${o.snippet} [${o.link}]`);
  }
  return parts.join('\n');
}

Platforms Used

Google

Web search with knowledge graph, PAA, and AI overviews

Reddit

Community, posts & threaded comments from any subreddit

YouTube

Video search with transcripts and metadata

Frequently Asked Questions

Agents that fetch full web pages to answer questions waste 80-90% of context tokens on irrelevant content: cookie banners, navigation menus, related article links, ad scripts, and footer boilerplate. A 12,000-token page might contain 1,200 tokens of useful information. With 128K context windows filling up across a multi-turn session, the agent either truncates important earlier context or hits the limit and loses coherence.

Use Scavio's search API as an extraction layer via MCP. Instead of fetching and parsing entire pages, query for the specific information needed. The API returns structured fields (title, snippet, AI Overview) that contain the signal without the noise. For multi-source research, five structured results use fewer tokens than one raw page fetch.

Agent developers hitting context window limits in multi-turn sessions who need to reduce the token cost of web-sourced context.

Yes. Scavio's free tier includes 500 credits per month with no credit card required. That is enough to validate this solution in your workflow.

Cut Context Window Size with Extraction MCP

Use Scavio's search API as an extraction layer via MCP. Instead of fetching and parsing entire pages, query for the specific information needed. The API returns structured fields (