The Problem
Local LLMs (Qwen 9B-35B, Llama-3, DeepSeek) hallucinate on web-search-grounded answers when fed raw scraped HTML. Tight context windows compress signal proportionally more than cloud LLMs.
The Scavio Solution
Scavio's typed JSON sources + an explicit citation prompt + Ollama local LLM. Cuts hallucination measurably on the same model. Optional cross-check against AI Overview citations.
Before
Qwen 27B + raw scraped HTML grounding = ~18% hallucination on factual queries (per the r/LocalLLaMA report).
After
Qwen 27B + Scavio typed JSON + citation prompt = <3% hallucination on the same queries.
Who It Is For
Local-LLM enthusiasts, privacy-first agent builders, teams running on-prem LLMs that need fresh web context.
Key Benefits
- 10x reduction in hallucination on grounded queries
- Token-efficient JSON (~1.5K vs 25-40K HTML)
- Cross-check against AI Overview citations
- Works on any Ollama-compatible local model
- Stack cost ~$30 (Scavio) + $0 (local LLM)
Python Example
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
PROMPT = '''Answer using ONLY the sources. Every claim followed by [N].
If sources don't answer, say "I don't know based on these sources."
Sources:
{src}
Question: {q}'''
def ask(q):
r = requests.post('https://api.scavio.dev/api/v1/search', headers=H, json={'query': q}).json()
src = '\n'.join(f'[{i+1}] {x["title"]} ({x["link"]}): {x["snippet"]}' for i, x in enumerate(r['organic_results'][:10]))
return requests.post('http://localhost:11434/api/generate', json={'model':'qwen2.5:32b', 'prompt': PROMPT.format(src=src, q=q), 'stream': False}).json()['response']JavaScript Example
// Same in TS via fetch + Ollama HTTP API.Platforms Used
Web search with knowledge graph, PAA, and AI overviews