An r/LocalLLaMA post showed Qwen 9B/27B/35B hallucinating on web-search-grounded answers. The fix is typed JSON instead of raw HTML and an explicit citation prompt. This walks the pattern.
Prerequisites
- A local LLM via Ollama / LM Studio / vLLM
- Scavio API key
- Awareness of context-window limits (4K-32K typical for 9B-35B)
Walkthrough
Step 1: Pull typed JSON via Scavio
5-10 results per query, well under the local LLM's context.
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
def search(q):
r = requests.post('https://api.scavio.dev/api/v1/search', headers=H, json={'query': q}).json()
return r.get('organic_results', [])[:10]Step 2: Format sources as a numbered citation block
Local LLMs respond better to explicit numbering.
def fmt_sources(results):
return '\n'.join(f'[{i+1}] {r["title"]} ({r["link"]}): {r["snippet"]}' for i, r in enumerate(results))Step 3: Use a strict citation prompt
Local LLMs ignore softer instructions; be explicit.
PROMPT = '''Answer using ONLY the sources below. Every claim must be followed by [N] where N is the source number.
If the sources do not answer the question, say "I don't know based on the provided sources."
Sources:
{sources}
Question: {question}'''Step 4: Call the local LLM via Ollama
Standard Ollama /api/generate.
import requests
def ask_local(q, results):
prompt = PROMPT.format(sources=fmt_sources(results), question=q)
r = requests.post('http://localhost:11434/api/generate',
json={'model': 'qwen2.5:32b', 'prompt': prompt, 'stream': False}).json()
return r['response']Step 5: Cross-check against AI Overview
If local LLM answer disagrees with Google's AI Overview citation set, flag.
# Re-run the Scavio search with include_ai_overview: true.
# Compare the local LLM's claims against the AI Overview's citation set.
# Disagreement = potential hallucination, surface to the user.Python Example
# Per query: 1-2 Scavio calls + 1 local LLM call. Cost: ~$0.005 + $0 (local).JavaScript Example
// Same in TS via fetch + Ollama.Expected Output
Local LLM that grounds answers in typed JSON sources, cites them with [N] markers, and abstains when sources don't cover the question. Hallucination rate drops measurably on the same model.