How long does this build an ai content grounding pipeline tutorial take?

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

What do I need before starting?

Python 3.10+ installed. requests library installed. A Scavio API key from scavio.dev. An OpenAI API key (or any LLM API for claim extraction). A Scavio API key gives you 50 free credits on signup.

Can I run this tutorial with the free tier?

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

What frameworks does this work with?

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Ground LLM Responses with Search Data (2026)

LLMs generate fluent text but frequently hallucinate statistics, dates, product details, and claims. Content grounding solves this by running the LLM's assertions through a verification loop: extract factual claims from the generated text, search for each claim via a real-time search API, and flag or replace any claim that contradicts the search evidence. This tutorial builds a grounding pipeline that takes raw LLM output, extracts checkable claims, verifies each one against Scavio search results, and produces a grounded version with citation URLs. The pipeline catches hallucinated numbers, outdated information, and fabricated sources before they reach production.

Prerequisites

Python 3.10+ installed
requests library installed
A Scavio API key from scavio.dev
An OpenAI API key (or any LLM API for claim extraction)

Walkthrough

Step 1: Extract factual claims from LLM output

Parse the generated text to identify statements that contain verifiable facts: numbers, dates, product names, company claims. Use a second LLM call to extract these as a list.

Python

import os, requests, json

SCAVIO_KEY = os.environ['SCAVIO_API_KEY']
OPENAI_KEY = os.environ['OPENAI_API_KEY']
SEARCH_ENDPOINT = 'https://api.scavio.dev/api/v1/search'
SEARCH_HEADERS = {'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json'}

def extract_claims(text):
    resp = requests.post('https://api.openai.com/v1/chat/completions',
        headers={'Authorization': f'Bearer {OPENAI_KEY}', 'Content-Type': 'application/json'},
        json={'model': 'gpt-4o', 'temperature': 0,
            'messages': [{'role': 'system', 'content': 'Extract all factual claims from the text. Return a JSON array of strings, each a single verifiable claim.'},
                {'role': 'user', 'content': text}],
            'response_format': {'type': 'json_object'}})
    return json.loads(resp.json()['choices'][0]['message']['content']).get('claims', [])

Step 2: Verify each claim against search results

For each extracted claim, run a Scavio search query and check whether the top results support, contradict, or are silent on the claim.

Python

def verify_claim(claim):
    resp = requests.post(SEARCH_ENDPOINT, headers=SEARCH_HEADERS,
        json={'query': claim, 'country_code': 'us'})
    results = resp.json().get('organic_results', [])[:5]
    snippets = [r.get('snippet', '') for r in results if r.get('snippet')]
    sources = [r['link'] for r in results[:3]]
    evidence = ' '.join(snippets).lower()
    claim_lower = claim.lower()
    supported = any(word in evidence for word in claim_lower.split() if len(word) > 4)
    return {
        'claim': claim,
        'status': 'SUPPORTED' if supported else 'UNVERIFIED',
        'sources': sources,
        'evidence_preview': snippets[0][:200] if snippets else '',
    }

Step 3: Build the grounded output with citations

Replace or annotate unverified claims in the original text. Append source URLs as citations for verified claims.

Python

def ground_content(raw_text):
    claims = extract_claims(raw_text)
    print(f'Extracted {len(claims)} claims to verify')
    verifications = []
    for claim in claims:
        result = verify_claim(claim)
        verifications.append(result)
        print(f"  [{result['status']}] {claim[:60]}")
    grounded = raw_text
    citations = []
    for v in verifications:
        if v['status'] == 'SUPPORTED' and v['sources']:
            citations.append(f"- {v['claim'][:80]}: {v['sources'][0]}")
        elif v['status'] == 'UNVERIFIED':
            grounded = grounded.replace(v['claim'],
                f"{v['claim']} [UNVERIFIED - needs manual review]")
    grounded += '\n\nSources:\n' + '\n'.join(citations) if citations else ''
    cost = len(claims) * 0.005
    print(f'Verification cost: ${cost:.3f} ({len(claims)} searches)')
    return grounded, verifications

Python Example

Python

import os, requests, json

SCAVIO_KEY = os.environ['SCAVIO_API_KEY']
SH = {'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json'}

def search(query):
    return requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': query, 'country_code': 'us'}).json()

def verify_claims(claims):
    results = []
    for claim in claims:
        data = search(claim)
        snippets = [r.get('snippet', '') for r in data.get('organic_results', [])[:5]]
        sources = [r['link'] for r in data.get('organic_results', [])[:3]]
        evidence = ' '.join(snippets).lower()
        supported = any(w in evidence for w in claim.lower().split() if len(w) > 4)
        results.append({'claim': claim, 'ok': supported, 'sources': sources})
    return results

def ground(text, claims):
    verified = verify_claims(claims)
    for v in verified:
        tag = 'OK' if v['ok'] else 'UNVERIFIED'
        print(f'[{tag}] {v["claim"][:60]}')
    bad = [v for v in verified if not v['ok']]
    print(f'{len(verified) - len(bad)}/{len(verified)} claims verified')
    print(f'Cost: ${len(claims) * 0.005:.3f}')

claims = ['Python is the most popular programming language in 2026',
    'FastAPI processes 10 million requests per second']
ground('sample text', claims)

JavaScript Example

JavaScript

const SCAVIO_KEY = process.env.SCAVIO_API_KEY;
const SH = { 'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json' };

async function search(query) {
  return fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: SH, body: JSON.stringify({ query, country_code: 'us' })
  }).then(r => r.json());
}

async function verifyClaims(claims) {
  const results = [];
  for (const claim of claims) {
    const data = await search(claim);
    const snippets = (data.organic_results || []).slice(0, 5)
      .map(r => r.snippet || '').join(' ').toLowerCase();
    const sources = (data.organic_results || []).slice(0, 3).map(r => r.link);
    const supported = claim.toLowerCase().split(' ')
      .filter(w => w.length > 4).some(w => snippets.includes(w));
    results.push({ claim, ok: supported, sources });
  }
  return results;
}

async function ground(claims) {
  const results = await verifyClaims(claims);
  results.forEach(v => console.log(`[${v.ok ? 'OK' : 'UNVERIFIED'}] ${v.claim.slice(0, 60)}`));
  const verified = results.filter(v => v.ok).length;
  console.log(`${verified}/${results.length} claims verified`);
  console.log(`Cost: $${(claims.length * 0.005).toFixed(3)}`);
}

ground(['Python is the most popular language in 2026']).catch(console.error);

Expected Output

JSON

Extracted 5 claims to verify
  [SUPPORTED] Python is the most popular programming language in 2026
  [UNVERIFIED] FastAPI processes 10 million requests per second
  [SUPPORTED] Django 5.2 was released in April 2026
  [SUPPORTED] OpenAI has over 200 million weekly active users
  [UNVERIFIED] Rust will replace Python by 2028

3/5 claims verified
Verification cost: $0.025 (5 searches)

Prerequisites

Python 3.10+ installed
requests library installed
A Scavio API key from scavio.dev
An OpenAI API key (or any LLM API for claim extraction)

Walkthrough

Step 1: Extract factual claims from LLM output

Parse the generated text to identify statements that contain verifiable facts: numbers, dates, product names, company claims. Use a second LLM call to extract these as a list.

Python

import os, requests, json

SCAVIO_KEY = os.environ['SCAVIO_API_KEY']
OPENAI_KEY = os.environ['OPENAI_API_KEY']
SEARCH_ENDPOINT = 'https://api.scavio.dev/api/v1/search'
SEARCH_HEADERS = {'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json'}

def extract_claims(text):
    resp = requests.post('https://api.openai.com/v1/chat/completions',
        headers={'Authorization': f'Bearer {OPENAI_KEY}', 'Content-Type': 'application/json'},
        json={'model': 'gpt-4o', 'temperature': 0,
            'messages': [{'role': 'system', 'content': 'Extract all factual claims from the text. Return a JSON array of strings, each a single verifiable claim.'},
                {'role': 'user', 'content': text}],
            'response_format': {'type': 'json_object'}})
    return json.loads(resp.json()['choices'][0]['message']['content']).get('claims', [])

Step 2: Verify each claim against search results

For each extracted claim, run a Scavio search query and check whether the top results support, contradict, or are silent on the claim.

Python

def verify_claim(claim):
    resp = requests.post(SEARCH_ENDPOINT, headers=SEARCH_HEADERS,
        json={'query': claim, 'country_code': 'us'})
    results = resp.json().get('organic_results', [])[:5]
    snippets = [r.get('snippet', '') for r in results if r.get('snippet')]
    sources = [r['link'] for r in results[:3]]
    evidence = ' '.join(snippets).lower()
    claim_lower = claim.lower()
    supported = any(word in evidence for word in claim_lower.split() if len(word) > 4)
    return {
        'claim': claim,
        'status': 'SUPPORTED' if supported else 'UNVERIFIED',
        'sources': sources,
        'evidence_preview': snippets[0][:200] if snippets else '',
    }

Step 3: Build the grounded output with citations

Replace or annotate unverified claims in the original text. Append source URLs as citations for verified claims.

Python

def ground_content(raw_text):
    claims = extract_claims(raw_text)
    print(f'Extracted {len(claims)} claims to verify')
    verifications = []
    for claim in claims:
        result = verify_claim(claim)
        verifications.append(result)
        print(f"  [{result['status']}] {claim[:60]}")
    grounded = raw_text
    citations = []
    for v in verifications:
        if v['status'] == 'SUPPORTED' and v['sources']:
            citations.append(f"- {v['claim'][:80]}: {v['sources'][0]}")
        elif v['status'] == 'UNVERIFIED':
            grounded = grounded.replace(v['claim'],
                f"{v['claim']} [UNVERIFIED - needs manual review]")
    grounded += '\n\nSources:\n' + '\n'.join(citations) if citations else ''
    cost = len(claims) * 0.005
    print(f'Verification cost: ${cost:.3f} ({len(claims)} searches)')
    return grounded, verifications

Python Example

Python

import os, requests, json

SCAVIO_KEY = os.environ['SCAVIO_API_KEY']
SH = {'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json'}

def search(query):
    return requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': query, 'country_code': 'us'}).json()

def verify_claims(claims):
    results = []
    for claim in claims:
        data = search(claim)
        snippets = [r.get('snippet', '') for r in data.get('organic_results', [])[:5]]
        sources = [r['link'] for r in data.get('organic_results', [])[:3]]
        evidence = ' '.join(snippets).lower()
        supported = any(w in evidence for w in claim.lower().split() if len(w) > 4)
        results.append({'claim': claim, 'ok': supported, 'sources': sources})
    return results

def ground(text, claims):
    verified = verify_claims(claims)
    for v in verified:
        tag = 'OK' if v['ok'] else 'UNVERIFIED'
        print(f'[{tag}] {v["claim"][:60]}')
    bad = [v for v in verified if not v['ok']]
    print(f'{len(verified) - len(bad)}/{len(verified)} claims verified')
    print(f'Cost: ${len(claims) * 0.005:.3f}')

claims = ['Python is the most popular programming language in 2026',
    'FastAPI processes 10 million requests per second']
ground('sample text', claims)

JavaScript Example

JavaScript

const SCAVIO_KEY = process.env.SCAVIO_API_KEY;
const SH = { 'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json' };

async function search(query) {
  return fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: SH, body: JSON.stringify({ query, country_code: 'us' })
  }).then(r => r.json());
}

async function verifyClaims(claims) {
  const results = [];
  for (const claim of claims) {
    const data = await search(claim);
    const snippets = (data.organic_results || []).slice(0, 5)
      .map(r => r.snippet || '').join(' ').toLowerCase();
    const sources = (data.organic_results || []).slice(0, 3).map(r => r.link);
    const supported = claim.toLowerCase().split(' ')
      .filter(w => w.length > 4).some(w => snippets.includes(w));
    results.push({ claim, ok: supported, sources });
  }
  return results;
}

async function ground(claims) {
  const results = await verifyClaims(claims);
  results.forEach(v => console.log(`[${v.ok ? 'OK' : 'UNVERIFIED'}] ${v.claim.slice(0, 60)}`));
  const verified = results.filter(v => v.ok).length;
  console.log(`${verified}/${results.length} claims verified`);
  console.log(`Cost: $${(claims.length * 0.005).toFixed(3)}`);
}

ground(['Python is the most popular language in 2026']).catch(console.error);

Expected Output

JSON

Extracted 5 claims to verify
  [SUPPORTED] Python is the most popular programming language in 2026
  [UNVERIFIED] FastAPI processes 10 million requests per second
  [SUPPORTED] Django 5.2 was released in April 2026
  [SUPPORTED] OpenAI has over 200 million weekly active users
  [UNVERIFIED] Rust will replace Python by 2028

3/5 claims verified
Verification cost: $0.025 (5 searches)

How to Build an AI Content Grounding Pipeline

Prerequisites

Walkthrough

Step 1: Extract factual claims from LLM output

Step 2: Verify each claim against search results

Step 3: Build the grounded output with citations

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this build an ai content grounding pipeline tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Best Search APIs for Open-Source LLM Grounding in 2026

Local LLM Search Grounding via API

Best Search APIs for Pipeline Integration in 2026

LLM Grounding

Search API Provider Landscape (2026)

Ground LLM Responses with Real-Time Search Data

Start Building

How to Build an AI Content Grounding Pipeline

Prerequisites

Walkthrough

Step 1: Extract factual claims from LLM output

Step 2: Verify each claim against search results

Step 3: Build the grounded output with citations

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this build an ai content grounding pipeline tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Best Search APIs for Open-Source LLM Grounding in 2026

Local LLM Search Grounding via API

Best Search APIs for Pipeline Integration in 2026

LLM Grounding

Search API Provider Landscape (2026)

Ground LLM Responses with Real-Time Search Data

Start Building