local-llmgemmagrounding

Gemma 4 News Site: Search API Grounding Fixes Hallucinated Links

Running Gemma 4 31B locally for news. The model hallucinates links without search grounding. Adding search API for link validation and source verification.

May 9, 2026

6 min read

Gemma 4 27B runs well on consumer hardware and generates fluent text. But if you point it at a news summarization task, it will fabricate URLs, invent publication dates, and attribute quotes to people who never said them. The model does not know what is on the web right now. Adding a search API for grounding turns a hallucination machine into a usable news tool.

The hallucination problem with local news models

Gemma 4 has a knowledge cutoff. Anything published after that cutoff does not exist in the model's weights. Ask it to summarize today's tech news and it will confidently generate plausible-sounding articles that never existed. It does not know it is wrong — the text just continues in the most probable direction.

This is not a Gemma-specific problem. Every local LLM without retrieval augmentation will do the same thing. The difference is that hosted models like GPT-4o and Claude have built-in search tools. When you run locally, you need to add that yourself.

Architecture: search-then-generate

The pattern is straightforward. Before the model generates any text about current events, you query a search API for real results. Those results become the context window. The model summarizes what it was given instead of inventing from scratch.

Python

import requests
import json

SCAVIO_KEY = "YOUR_API_KEY"

def search_news(topic: str, num_results: int = 5) -> list:
    """Get current news results for a topic."""
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": SCAVIO_KEY},
        json={
            "query": topic,
            "platform": "google",
            "num_results": num_results
        }
    )
    return resp.json().get("results", [])

def build_grounded_prompt(topic: str) -> str:
    """Build a prompt grounded in real search results."""
    results = search_news(topic)
    context = ""
    for i, r in enumerate(results, 1):
        context += (
            f"[{i}] {r.get('title', 'N/A')}\n"
            f"URL: {r.get('url', 'N/A')}\n"
            f"Snippet: {r.get('snippet', 'N/A')}\n\n"
        )

    return f"""Summarize the following news results about
"{topic}". Only use information from the provided sources.
Cite each claim with the source number [1], [2], etc.
If the sources do not cover something, say so.

Sources:
{context}

Summary:"""

prompt = build_grounded_prompt("AI regulation EU 2026")
print(prompt)

Connecting to Gemma 4 via Ollama

If you are running Gemma 4 locally through Ollama, the integration is a simple HTTP call. The key constraint: never let the model answer questions about current events without search context in the prompt.

Python

def generate_with_gemma(prompt: str) -> str:
    """Send grounded prompt to local Gemma 4 via Ollama."""
    resp = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": "gemma4:27b",
            "prompt": prompt,
            "stream": False,
            "options": {
                "temperature": 0.3,
                "num_ctx": 8192
            }
        }
    )
    return resp.json().get("response", "")

# Full pipeline: search -> build prompt -> generate
topic = "OpenAI GPT-5 release"
grounded_prompt = build_grounded_prompt(topic)
summary = generate_with_gemma(grounded_prompt)
print(summary)

Link validation: the second search pass

Even with grounded prompts, models sometimes mangle URLs or combine details from multiple sources. A second validation pass checks that any URL the model outputs actually exists in the original search results. If the model invents a URL, strip it.

Python

def validate_citations(summary: str, sources: list) -> str:
    """Remove any URLs from summary not in original sources."""
    valid_urls = {r.get("url", "") for r in sources}
    lines = summary.split("\n")
    validated = []
    for line in lines:
        # Simple check: if line contains a URL not in sources,
        # replace it with [source not verified]
        has_bad_url = False
        for word in line.split():
            if word.startswith("http") and word not in valid_urls:
                has_bad_url = True
                line = line.replace(word, "[source not verified]")
        validated.append(line)
    return "\n".join(validated)

Cost and performance

Running Gemma 4 27B locally is free after hardware costs. The search grounding adds $0.005/query via Scavio. A news site generating 50 summaries/day spends $0.25/day on search — about $7.50/mo. Compare that to using a hosted model with built-in search: GPT-4o with browsing costs roughly $0.03-0.10 per summary depending on token count, or $45-150/mo for the same volume.

Honest limitations

This approach is slower than a hosted API. Gemma 4 27B on a 24GB GPU generates at roughly 15-25 tokens/second. The search call adds 200-500ms latency. For a news site that pre-generates content, this is fine. For real-time chat, it may feel sluggish. The model also does not match GPT-4o or Claude in summarization quality — it is good enough for structured news briefs, less reliable for nuanced analysis. Ground the model, constrain the output format, and verify citations. That is the practical path to a local news LLM that does not fabricate.

Gemma 4 News Site: Search API Grounding Fixes Hallucinated Links

The hallucination problem with local news models

Architecture: search-then-generate

Connecting to Gemma 4 via Ollama

Link validation: the second search pass

Cost and performance

Honest limitations

Continue reading

Connect Scavio to Any AI Assistant with MCP

Build a Cross-Platform Product Research Agent with LangGraph