Tutorial

How to Add Grounded Web Search to a Local LLM

An r/LocalLLaMA post showed Qwen hallucinating on web search. Walk-through: typed JSON via Scavio + a citation prompt fixes it.

An r/LocalLLaMA post showed Qwen 9B/27B/35B hallucinating on web-search-grounded answers. The fix is typed JSON instead of raw HTML and an explicit citation prompt. This walks the pattern.

Prerequisites

  • A local LLM via Ollama / LM Studio / vLLM
  • Scavio API key
  • Awareness of context-window limits (4K-32K typical for 9B-35B)

Walkthrough

Step 1: Pull typed JSON via Scavio

5-10 results per query, well under the local LLM's context.

Python
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
def search(q):
    r = requests.post('https://api.scavio.dev/api/v1/search', headers=H, json={'query': q}).json()
    return r.get('organic_results', [])[:10]

Step 2: Format sources as a numbered citation block

Local LLMs respond better to explicit numbering.

Python
def fmt_sources(results):
    return '\n'.join(f'[{i+1}] {r["title"]} ({r["link"]}): {r["snippet"]}' for i, r in enumerate(results))

Step 3: Use a strict citation prompt

Local LLMs ignore softer instructions; be explicit.

Python
PROMPT = '''Answer using ONLY the sources below. Every claim must be followed by [N] where N is the source number.
If the sources do not answer the question, say "I don't know based on the provided sources."

Sources:
{sources}

Question: {question}'''

Step 4: Call the local LLM via Ollama

Standard Ollama /api/generate.

Python
import requests
def ask_local(q, results):
    prompt = PROMPT.format(sources=fmt_sources(results), question=q)
    r = requests.post('http://localhost:11434/api/generate',
        json={'model': 'qwen2.5:32b', 'prompt': prompt, 'stream': False}).json()
    return r['response']

Step 5: Cross-check against AI Overview

If local LLM answer disagrees with Google's AI Overview citation set, flag.

Text
# Re-run the Scavio search with include_ai_overview: true.
# Compare the local LLM's claims against the AI Overview's citation set.
# Disagreement = potential hallucination, surface to the user.

Python Example

Python
# Per query: 1-2 Scavio calls + 1 local LLM call. Cost: ~$0.005 + $0 (local).

JavaScript Example

JavaScript
// Same in TS via fetch + Ollama.

Expected Output

JSON
Local LLM that grounds answers in typed JSON sources, cites them with [N] markers, and abstains when sources don't cover the question. Hallucination rate drops measurably on the same model.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

A local LLM via Ollama / LM Studio / vLLM. Scavio API key. Awareness of context-window limits (4K-32K typical for 9B-35B). A Scavio API key gives you 500 free credits per month.

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

An r/LocalLLaMA post showed Qwen hallucinating on web search. Walk-through: typed JSON via Scavio + a citation prompt fixes it.