How long does this build curated search for ai agents tutorial take?

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

What do I need before starting?

Python 3.8+. requests library installed. Scavio API key from scavio.dev. Basic understanding of token counting (tiktoken optional). A Scavio API key gives you 50 free credits on signup.

Can I run this tutorial with the free tier?

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

What frameworks does this work with?

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Build Curated Search for AI Agents (2026)

Building curated search for AI agents means wrapping a search API with filtering, re-ranking, and compression so the agent receives only high-signal results instead of raw SERP noise. Raw search results waste tokens on ads, irrelevant domains, and verbose snippets that dilute agent reasoning. This tutorial builds a Python middleware that sits between your agent and the Scavio search API, applying domain allowlists, relevance scoring, and snippet truncation to cut token usage by 70% while improving answer quality.

Prerequisites

Python 3.8+
requests library installed
Scavio API key from scavio.dev
Basic understanding of token counting (tiktoken optional)

Walkthrough

Step 1: Define domain filters and quality rules

Create allowlists and blocklists for domains, plus rules for filtering low-quality results like forums with no answers or paywalled content.

Python

BLOCKED_DOMAINS = [
    'pinterest.com', 'quora.com', 'facebook.com',
    'instagram.com', 'tiktok.com', 'twitter.com',
]

TRUSTED_DOMAINS = [
    'docs.python.org', 'developer.mozilla.org', 'github.com',
    'stackoverflow.com', 'arxiv.org', 'huggingface.co',
]

def domain_score(url):
    domain = url.split('/')[2] if url else ''
    if any(b in domain for b in BLOCKED_DOMAINS):
        return -10
    if any(t in domain for t in TRUSTED_DOMAINS):
        return 5
    return 0

Step 2: Build the search wrapper with compression

Create a function that calls Scavio, filters results, scores relevance, and compresses snippets to a token budget.

Python

import requests, os

H = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}

def curated_search(query, max_results=5, max_snippet_chars=200):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=H, json={'query': query, 'country_code': 'us'}).json()
    results = data.get('organic_results', [])
    scored = []
    for r in results:
        ds = domain_score(r.get('link', ''))
        if ds < 0:
            continue
        snippet = (r.get('snippet') or '')[:max_snippet_chars]
        scored.append({
            'title': r.get('title', ''),
            'url': r.get('link', ''),
            'snippet': snippet,
            'score': ds + r.get('position', 10) * -0.5
        })
    scored.sort(key=lambda x: x['score'], reverse=True)
    return scored[:max_results]

Step 3: Add relevance re-ranking

Score results by keyword overlap with the original query to push the most relevant results to the top.

Python

def relevance_score(query, result):
    query_terms = set(query.lower().split())
    text = f"{result['title']} {result['snippet']}".lower()
    matches = sum(1 for t in query_terms if t in text)
    return matches / max(len(query_terms), 1)

def curated_search_v2(query, max_results=5, max_snippet_chars=200):
    raw = curated_search(query, max_results=15, max_snippet_chars=max_snippet_chars)
    for r in raw:
        r['score'] += relevance_score(query, r) * 3
    raw.sort(key=lambda x: x['score'], reverse=True)
    return raw[:max_results]

Step 4: Measure token savings

Compare token counts between raw and curated results to verify the compression ratio.

Python

def estimate_tokens(text):
    return len(text.split()) * 1.3  # rough estimate

def compare_token_usage(query):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=H, json={'query': query, 'country_code': 'us'}).json()
    raw_text = str(data.get('organic_results', []))
    raw_tokens = estimate_tokens(raw_text)

    curated = curated_search_v2(query)
    curated_text = str(curated)
    curated_tokens = estimate_tokens(curated_text)

    savings = (1 - curated_tokens / raw_tokens) * 100
    print(f'Raw: ~{int(raw_tokens)} tokens')
    print(f'Curated: ~{int(curated_tokens)} tokens')
    print(f'Savings: {savings:.0f}%')

compare_token_usage('how to deploy FastAPI to production')

Python Example

Python

import os, requests

H = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}
BLOCKED = ['pinterest.com', 'quora.com', 'facebook.com']
TRUSTED = ['docs.python.org', 'developer.mozilla.org', 'github.com', 'stackoverflow.com']

def curated_search(query, max_results=5, max_chars=200):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=H, json={'query': query, 'country_code': 'us'}).json()
    results = []
    query_terms = set(query.lower().split())
    for r in data.get('organic_results', []):
        domain = r.get('link', '').split('/')[2] if r.get('link') else ''
        if any(b in domain for b in BLOCKED):
            continue
        snippet = (r.get('snippet') or '')[:max_chars]
        text = f"{r.get('title', '')} {snippet}".lower()
        relevance = sum(1 for t in query_terms if t in text) / max(len(query_terms), 1)
        trust = 5 if any(t in domain for t in TRUSTED) else 0
        results.append({
            'title': r.get('title', ''),
            'url': r.get('link', ''),
            'snippet': snippet,
            'score': trust + relevance * 3
        })
    results.sort(key=lambda x: x['score'], reverse=True)
    return results[:max_results]

for r in curated_search('FastAPI deployment best practices 2026'):
    print(f"[{r['score']:.1f}] {r['title']}")
    print(f"  {r['url']}")

JavaScript Example

JavaScript

const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
const BLOCKED = ['pinterest.com', 'quora.com', 'facebook.com'];
const TRUSTED = ['docs.python.org', 'developer.mozilla.org', 'github.com', 'stackoverflow.com'];

async function curatedSearch(query, maxResults = 5, maxChars = 200) {
  const data = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: H,
    body: JSON.stringify({query, country_code: 'us'})
  }).then(r => r.json());
  const queryTerms = new Set(query.toLowerCase().split(' '));
  const results = (data.organic_results || []).map(r => {
    const domain = (r.link || '').split('/')[2] || '';
    if (BLOCKED.some(b => domain.includes(b))) return null;
    const snippet = (r.snippet || '').slice(0, maxChars);
    const text = \`\${r.title || ''} \${snippet}\`.toLowerCase();
    const relevance = [...queryTerms].filter(t => text.includes(t)).length / queryTerms.size;
    const trust = TRUSTED.some(t => domain.includes(t)) ? 5 : 0;
    return {title: r.title, url: r.link, snippet, score: trust + relevance * 3};
  }).filter(Boolean).sort((a, b) => b.score - a.score).slice(0, maxResults);
  return results;
}

curatedSearch('FastAPI deployment best practices 2026').then(results => {
  results.forEach(r => console.log(\`[\${r.score.toFixed(1)}] \${r.title}\`));
});

Expected Output

JSON

[8.0] FastAPI Deployment Guide - Official Docs
  https://docs.python.org/fastapi-deploy
[5.5] Deploy FastAPI on AWS Lambda in 2026
  https://github.com/example/fastapi-lambda
[3.0] Production FastAPI Checklist
  https://example.com/fastapi-prod

Prerequisites

Python 3.8+
requests library installed
Scavio API key from scavio.dev
Basic understanding of token counting (tiktoken optional)

Walkthrough

Step 1: Define domain filters and quality rules

Create allowlists and blocklists for domains, plus rules for filtering low-quality results like forums with no answers or paywalled content.

Python

BLOCKED_DOMAINS = [
    'pinterest.com', 'quora.com', 'facebook.com',
    'instagram.com', 'tiktok.com', 'twitter.com',
]

TRUSTED_DOMAINS = [
    'docs.python.org', 'developer.mozilla.org', 'github.com',
    'stackoverflow.com', 'arxiv.org', 'huggingface.co',
]

def domain_score(url):
    domain = url.split('/')[2] if url else ''
    if any(b in domain for b in BLOCKED_DOMAINS):
        return -10
    if any(t in domain for t in TRUSTED_DOMAINS):
        return 5
    return 0

Step 2: Build the search wrapper with compression

Create a function that calls Scavio, filters results, scores relevance, and compresses snippets to a token budget.

Python

import requests, os

H = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}

def curated_search(query, max_results=5, max_snippet_chars=200):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=H, json={'query': query, 'country_code': 'us'}).json()
    results = data.get('organic_results', [])
    scored = []
    for r in results:
        ds = domain_score(r.get('link', ''))
        if ds < 0:
            continue
        snippet = (r.get('snippet') or '')[:max_snippet_chars]
        scored.append({
            'title': r.get('title', ''),
            'url': r.get('link', ''),
            'snippet': snippet,
            'score': ds + r.get('position', 10) * -0.5
        })
    scored.sort(key=lambda x: x['score'], reverse=True)
    return scored[:max_results]

Step 3: Add relevance re-ranking

Score results by keyword overlap with the original query to push the most relevant results to the top.

Python

def relevance_score(query, result):
    query_terms = set(query.lower().split())
    text = f"{result['title']} {result['snippet']}".lower()
    matches = sum(1 for t in query_terms if t in text)
    return matches / max(len(query_terms), 1)

def curated_search_v2(query, max_results=5, max_snippet_chars=200):
    raw = curated_search(query, max_results=15, max_snippet_chars=max_snippet_chars)
    for r in raw:
        r['score'] += relevance_score(query, r) * 3
    raw.sort(key=lambda x: x['score'], reverse=True)
    return raw[:max_results]

Step 4: Measure token savings

Compare token counts between raw and curated results to verify the compression ratio.

Python

def estimate_tokens(text):
    return len(text.split()) * 1.3  # rough estimate

def compare_token_usage(query):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=H, json={'query': query, 'country_code': 'us'}).json()
    raw_text = str(data.get('organic_results', []))
    raw_tokens = estimate_tokens(raw_text)

    curated = curated_search_v2(query)
    curated_text = str(curated)
    curated_tokens = estimate_tokens(curated_text)

    savings = (1 - curated_tokens / raw_tokens) * 100
    print(f'Raw: ~{int(raw_tokens)} tokens')
    print(f'Curated: ~{int(curated_tokens)} tokens')
    print(f'Savings: {savings:.0f}%')

compare_token_usage('how to deploy FastAPI to production')

Python Example

Python

import os, requests

H = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}
BLOCKED = ['pinterest.com', 'quora.com', 'facebook.com']
TRUSTED = ['docs.python.org', 'developer.mozilla.org', 'github.com', 'stackoverflow.com']

def curated_search(query, max_results=5, max_chars=200):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=H, json={'query': query, 'country_code': 'us'}).json()
    results = []
    query_terms = set(query.lower().split())
    for r in data.get('organic_results', []):
        domain = r.get('link', '').split('/')[2] if r.get('link') else ''
        if any(b in domain for b in BLOCKED):
            continue
        snippet = (r.get('snippet') or '')[:max_chars]
        text = f"{r.get('title', '')} {snippet}".lower()
        relevance = sum(1 for t in query_terms if t in text) / max(len(query_terms), 1)
        trust = 5 if any(t in domain for t in TRUSTED) else 0
        results.append({
            'title': r.get('title', ''),
            'url': r.get('link', ''),
            'snippet': snippet,
            'score': trust + relevance * 3
        })
    results.sort(key=lambda x: x['score'], reverse=True)
    return results[:max_results]

for r in curated_search('FastAPI deployment best practices 2026'):
    print(f"[{r['score']:.1f}] {r['title']}")
    print(f"  {r['url']}")

JavaScript Example

JavaScript

const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
const BLOCKED = ['pinterest.com', 'quora.com', 'facebook.com'];
const TRUSTED = ['docs.python.org', 'developer.mozilla.org', 'github.com', 'stackoverflow.com'];

async function curatedSearch(query, maxResults = 5, maxChars = 200) {
  const data = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: H,
    body: JSON.stringify({query, country_code: 'us'})
  }).then(r => r.json());
  const queryTerms = new Set(query.toLowerCase().split(' '));
  const results = (data.organic_results || []).map(r => {
    const domain = (r.link || '').split('/')[2] || '';
    if (BLOCKED.some(b => domain.includes(b))) return null;
    const snippet = (r.snippet || '').slice(0, maxChars);
    const text = \`\${r.title || ''} \${snippet}\`.toLowerCase();
    const relevance = [...queryTerms].filter(t => text.includes(t)).length / queryTerms.size;
    const trust = TRUSTED.some(t => domain.includes(t)) ? 5 : 0;
    return {title: r.title, url: r.link, snippet, score: trust + relevance * 3};
  }).filter(Boolean).sort((a, b) => b.score - a.score).slice(0, maxResults);
  return results;
}

curatedSearch('FastAPI deployment best practices 2026').then(results => {
  results.forEach(r => console.log(\`[\${r.score.toFixed(1)}] \${r.title}\`));
});

Expected Output

JSON

[8.0] FastAPI Deployment Guide - Official Docs
  https://docs.python.org/fastapi-deploy
[5.5] Deploy FastAPI on AWS Lambda in 2026
  https://github.com/example/fastapi-lambda
[3.0] Production FastAPI Checklist
  https://example.com/fastapi-prod

How to Build Curated Search for AI Agents

Prerequisites

Walkthrough

Step 1: Define domain filters and quality rules

Step 2: Build the search wrapper with compression

Step 3: Add relevance re-ranking

Step 4: Measure token savings

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this build curated search for ai agents tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Best Search APIs for Agentic Stacks in 2026

Best Curated Search Tool for AI Agents in 2026

Agent Context Management

MCP Search Gateway for Multi-Agent Systems

Route Agent Searches to the Cheapest Provider Automatically

Search API Cost per Context Window

Start Building

How to Build Curated Search for AI Agents

Prerequisites

Walkthrough

Step 1: Define domain filters and quality rules

Step 2: Build the search wrapper with compression

Step 3: Add relevance re-ranking

Step 4: Measure token savings

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this build curated search for ai agents tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Best Search APIs for Agentic Stacks in 2026

Best Curated Search Tool for AI Agents in 2026

Agent Context Management

MCP Search Gateway for Multi-Agent Systems

Route Agent Searches to the Cheapest Provider Automatically

Search API Cost per Context Window

Start Building