LLMs generate fluent text but frequently hallucinate statistics, dates, product details, and claims. Content grounding solves this by running the LLM's assertions through a verification loop: extract factual claims from the generated text, search for each claim via a real-time search API, and flag or replace any claim that contradicts the search evidence. This tutorial builds a grounding pipeline that takes raw LLM output, extracts checkable claims, verifies each one against Scavio search results, and produces a grounded version with citation URLs. The pipeline catches hallucinated numbers, outdated information, and fabricated sources before they reach production.
Prerequisites
- Python 3.10+ installed
- requests library installed
- A Scavio API key from scavio.dev
- An OpenAI API key (or any LLM API for claim extraction)
Walkthrough
Step 1: Extract factual claims from LLM output
Parse the generated text to identify statements that contain verifiable facts: numbers, dates, product names, company claims. Use a second LLM call to extract these as a list.
import os, requests, json
SCAVIO_KEY = os.environ['SCAVIO_API_KEY']
OPENAI_KEY = os.environ['OPENAI_API_KEY']
SEARCH_ENDPOINT = 'https://api.scavio.dev/api/v1/search'
SEARCH_HEADERS = {'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json'}
def extract_claims(text):
resp = requests.post('https://api.openai.com/v1/chat/completions',
headers={'Authorization': f'Bearer {OPENAI_KEY}', 'Content-Type': 'application/json'},
json={'model': 'gpt-4o', 'temperature': 0,
'messages': [{'role': 'system', 'content': 'Extract all factual claims from the text. Return a JSON array of strings, each a single verifiable claim.'},
{'role': 'user', 'content': text}],
'response_format': {'type': 'json_object'}})
return json.loads(resp.json()['choices'][0]['message']['content']).get('claims', [])Step 2: Verify each claim against search results
For each extracted claim, run a Scavio search query and check whether the top results support, contradict, or are silent on the claim.
def verify_claim(claim):
resp = requests.post(SEARCH_ENDPOINT, headers=SEARCH_HEADERS,
json={'query': claim, 'country_code': 'us'})
results = resp.json().get('organic_results', [])[:5]
snippets = [r.get('snippet', '') for r in results if r.get('snippet')]
sources = [r['link'] for r in results[:3]]
evidence = ' '.join(snippets).lower()
claim_lower = claim.lower()
supported = any(word in evidence for word in claim_lower.split() if len(word) > 4)
return {
'claim': claim,
'status': 'SUPPORTED' if supported else 'UNVERIFIED',
'sources': sources,
'evidence_preview': snippets[0][:200] if snippets else '',
}Step 3: Build the grounded output with citations
Replace or annotate unverified claims in the original text. Append source URLs as citations for verified claims.
def ground_content(raw_text):
claims = extract_claims(raw_text)
print(f'Extracted {len(claims)} claims to verify')
verifications = []
for claim in claims:
result = verify_claim(claim)
verifications.append(result)
print(f" [{result['status']}] {claim[:60]}")
grounded = raw_text
citations = []
for v in verifications:
if v['status'] == 'SUPPORTED' and v['sources']:
citations.append(f"- {v['claim'][:80]}: {v['sources'][0]}")
elif v['status'] == 'UNVERIFIED':
grounded = grounded.replace(v['claim'],
f"{v['claim']} [UNVERIFIED - needs manual review]")
grounded += '\n\nSources:\n' + '\n'.join(citations) if citations else ''
cost = len(claims) * 0.005
print(f'Verification cost: ${cost:.3f} ({len(claims)} searches)')
return grounded, verificationsPython Example
import os, requests, json
SCAVIO_KEY = os.environ['SCAVIO_API_KEY']
SH = {'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json'}
def search(query):
return requests.post('https://api.scavio.dev/api/v1/search',
headers=SH, json={'query': query, 'country_code': 'us'}).json()
def verify_claims(claims):
results = []
for claim in claims:
data = search(claim)
snippets = [r.get('snippet', '') for r in data.get('organic_results', [])[:5]]
sources = [r['link'] for r in data.get('organic_results', [])[:3]]
evidence = ' '.join(snippets).lower()
supported = any(w in evidence for w in claim.lower().split() if len(w) > 4)
results.append({'claim': claim, 'ok': supported, 'sources': sources})
return results
def ground(text, claims):
verified = verify_claims(claims)
for v in verified:
tag = 'OK' if v['ok'] else 'UNVERIFIED'
print(f'[{tag}] {v["claim"][:60]}')
bad = [v for v in verified if not v['ok']]
print(f'{len(verified) - len(bad)}/{len(verified)} claims verified')
print(f'Cost: ${len(claims) * 0.005:.3f}')
claims = ['Python is the most popular programming language in 2026',
'FastAPI processes 10 million requests per second']
ground('sample text', claims)JavaScript Example
const SCAVIO_KEY = process.env.SCAVIO_API_KEY;
const SH = { 'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json' };
async function search(query) {
return fetch('https://api.scavio.dev/api/v1/search', {
method: 'POST', headers: SH, body: JSON.stringify({ query, country_code: 'us' })
}).then(r => r.json());
}
async function verifyClaims(claims) {
const results = [];
for (const claim of claims) {
const data = await search(claim);
const snippets = (data.organic_results || []).slice(0, 5)
.map(r => r.snippet || '').join(' ').toLowerCase();
const sources = (data.organic_results || []).slice(0, 3).map(r => r.link);
const supported = claim.toLowerCase().split(' ')
.filter(w => w.length > 4).some(w => snippets.includes(w));
results.push({ claim, ok: supported, sources });
}
return results;
}
async function ground(claims) {
const results = await verifyClaims(claims);
results.forEach(v => console.log(`[${v.ok ? 'OK' : 'UNVERIFIED'}] ${v.claim.slice(0, 60)}`));
const verified = results.filter(v => v.ok).length;
console.log(`${verified}/${results.length} claims verified`);
console.log(`Cost: $${(claims.length * 0.005).toFixed(3)}`);
}
ground(['Python is the most popular language in 2026']).catch(console.error);Expected Output
Extracted 5 claims to verify
[SUPPORTED] Python is the most popular programming language in 2026
[UNVERIFIED] FastAPI processes 10 million requests per second
[SUPPORTED] Django 5.2 was released in April 2026
[SUPPORTED] OpenAI has over 200 million weekly active users
[UNVERIFIED] Rust will replace Python by 2028
3/5 claims verified
Verification cost: $0.025 (5 searches)