Tutorial

How to Build an AI Content Grounding Pipeline

Build a pipeline that grounds LLM-generated content with verified search data. Reduce hallucinations by cross-referencing claims against live SERP results.

LLMs generate fluent text but frequently hallucinate statistics, dates, product details, and claims. Content grounding solves this by running the LLM's assertions through a verification loop: extract factual claims from the generated text, search for each claim via a real-time search API, and flag or replace any claim that contradicts the search evidence. This tutorial builds a grounding pipeline that takes raw LLM output, extracts checkable claims, verifies each one against Scavio search results, and produces a grounded version with citation URLs. The pipeline catches hallucinated numbers, outdated information, and fabricated sources before they reach production.

Prerequisites

  • Python 3.10+ installed
  • requests library installed
  • A Scavio API key from scavio.dev
  • An OpenAI API key (or any LLM API for claim extraction)

Walkthrough

Step 1: Extract factual claims from LLM output

Parse the generated text to identify statements that contain verifiable facts: numbers, dates, product names, company claims. Use a second LLM call to extract these as a list.

Python
import os, requests, json

SCAVIO_KEY = os.environ['SCAVIO_API_KEY']
OPENAI_KEY = os.environ['OPENAI_API_KEY']
SEARCH_ENDPOINT = 'https://api.scavio.dev/api/v1/search'
SEARCH_HEADERS = {'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json'}

def extract_claims(text):
    resp = requests.post('https://api.openai.com/v1/chat/completions',
        headers={'Authorization': f'Bearer {OPENAI_KEY}', 'Content-Type': 'application/json'},
        json={'model': 'gpt-4o', 'temperature': 0,
            'messages': [{'role': 'system', 'content': 'Extract all factual claims from the text. Return a JSON array of strings, each a single verifiable claim.'},
                {'role': 'user', 'content': text}],
            'response_format': {'type': 'json_object'}})
    return json.loads(resp.json()['choices'][0]['message']['content']).get('claims', [])

Step 2: Verify each claim against search results

For each extracted claim, run a Scavio search query and check whether the top results support, contradict, or are silent on the claim.

Python
def verify_claim(claim):
    resp = requests.post(SEARCH_ENDPOINT, headers=SEARCH_HEADERS,
        json={'query': claim, 'country_code': 'us'})
    results = resp.json().get('organic_results', [])[:5]
    snippets = [r.get('snippet', '') for r in results if r.get('snippet')]
    sources = [r['link'] for r in results[:3]]
    evidence = ' '.join(snippets).lower()
    claim_lower = claim.lower()
    supported = any(word in evidence for word in claim_lower.split() if len(word) > 4)
    return {
        'claim': claim,
        'status': 'SUPPORTED' if supported else 'UNVERIFIED',
        'sources': sources,
        'evidence_preview': snippets[0][:200] if snippets else '',
    }

Step 3: Build the grounded output with citations

Replace or annotate unverified claims in the original text. Append source URLs as citations for verified claims.

Python
def ground_content(raw_text):
    claims = extract_claims(raw_text)
    print(f'Extracted {len(claims)} claims to verify')
    verifications = []
    for claim in claims:
        result = verify_claim(claim)
        verifications.append(result)
        print(f"  [{result['status']}] {claim[:60]}")
    grounded = raw_text
    citations = []
    for v in verifications:
        if v['status'] == 'SUPPORTED' and v['sources']:
            citations.append(f"- {v['claim'][:80]}: {v['sources'][0]}")
        elif v['status'] == 'UNVERIFIED':
            grounded = grounded.replace(v['claim'],
                f"{v['claim']} [UNVERIFIED - needs manual review]")
    grounded += '\n\nSources:\n' + '\n'.join(citations) if citations else ''
    cost = len(claims) * 0.005
    print(f'Verification cost: ${cost:.3f} ({len(claims)} searches)')
    return grounded, verifications

Python Example

Python
import os, requests, json

SCAVIO_KEY = os.environ['SCAVIO_API_KEY']
SH = {'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json'}

def search(query):
    return requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': query, 'country_code': 'us'}).json()

def verify_claims(claims):
    results = []
    for claim in claims:
        data = search(claim)
        snippets = [r.get('snippet', '') for r in data.get('organic_results', [])[:5]]
        sources = [r['link'] for r in data.get('organic_results', [])[:3]]
        evidence = ' '.join(snippets).lower()
        supported = any(w in evidence for w in claim.lower().split() if len(w) > 4)
        results.append({'claim': claim, 'ok': supported, 'sources': sources})
    return results

def ground(text, claims):
    verified = verify_claims(claims)
    for v in verified:
        tag = 'OK' if v['ok'] else 'UNVERIFIED'
        print(f'[{tag}] {v["claim"][:60]}')
    bad = [v for v in verified if not v['ok']]
    print(f'{len(verified) - len(bad)}/{len(verified)} claims verified')
    print(f'Cost: ${len(claims) * 0.005:.3f}')

claims = ['Python is the most popular programming language in 2026',
    'FastAPI processes 10 million requests per second']
ground('sample text', claims)

JavaScript Example

JavaScript
const SCAVIO_KEY = process.env.SCAVIO_API_KEY;
const SH = { 'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json' };

async function search(query) {
  return fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: SH, body: JSON.stringify({ query, country_code: 'us' })
  }).then(r => r.json());
}

async function verifyClaims(claims) {
  const results = [];
  for (const claim of claims) {
    const data = await search(claim);
    const snippets = (data.organic_results || []).slice(0, 5)
      .map(r => r.snippet || '').join(' ').toLowerCase();
    const sources = (data.organic_results || []).slice(0, 3).map(r => r.link);
    const supported = claim.toLowerCase().split(' ')
      .filter(w => w.length > 4).some(w => snippets.includes(w));
    results.push({ claim, ok: supported, sources });
  }
  return results;
}

async function ground(claims) {
  const results = await verifyClaims(claims);
  results.forEach(v => console.log(`[${v.ok ? 'OK' : 'UNVERIFIED'}] ${v.claim.slice(0, 60)}`));
  const verified = results.filter(v => v.ok).length;
  console.log(`${verified}/${results.length} claims verified`);
  console.log(`Cost: $${(claims.length * 0.005).toFixed(3)}`);
}

ground(['Python is the most popular language in 2026']).catch(console.error);

Expected Output

JSON
Extracted 5 claims to verify
  [SUPPORTED] Python is the most popular programming language in 2026
  [UNVERIFIED] FastAPI processes 10 million requests per second
  [SUPPORTED] Django 5.2 was released in April 2026
  [SUPPORTED] OpenAI has over 200 million weekly active users
  [UNVERIFIED] Rust will replace Python by 2028

3/5 claims verified
Verification cost: $0.025 (5 searches)

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.10+ installed. requests library installed. A Scavio API key from scavio.dev. An OpenAI API key (or any LLM API for claim extraction). A Scavio API key gives you 250 free credits per month.

Yes. The free tier includes 250 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Build a pipeline that grounds LLM-generated content with verified search data. Reduce hallucinations by cross-referencing claims against live SERP results.