Workflow

Local LLM Grounding Pipeline Workflow

Ground a local LLM (Ollama, vLLM) with live search results: query Scavio, inject SERP context into the prompt, get factual answers without fine-tuning.

Overview

Local LLMs hallucinate on current events. This workflow injects live SERP results into the system prompt before each query, grounding the model's response in real data. Works with Ollama, vLLM, llama.cpp, or any OpenAI-compatible local endpoint.

Trigger

Per user query to local LLM

Schedule

Per user query

Workflow Steps

1

Receive user query

User asks a factual question to the local LLM chat interface.

2

Pre-flight search via Scavio

POST /api/v1/search with the user's query on platform=google. Take top-5 results with snippets.

3

Inject search context into system prompt

Prepend to system message: 'Use the following search results to answer. Cite sources. Results: [...]'

4

Forward augmented prompt to local LLM

Send to Ollama /api/chat or vLLM /v1/chat/completions with the enriched messages array.

5

Return grounded response to user

The local LLM now answers with real data instead of hallucinating.

Python Implementation

Python
import requests, os

scavio_key = os.environ["SCAVIO_API_KEY"]
user_query = "What is the current price of NVIDIA stock?"

search_resp = requests.post("https://api.scavio.dev/api/v1/search",
    headers={"x-api-key": scavio_key},
    json={"query": user_query, "platform": "google", "limit": 5})
context = "\n".join(f"- {r['title']}: {r['snippet']}" for r in search_resp.json().get("results", []))

ollama_resp = requests.post("http://localhost:11434/api/chat",
    json={"model": "qwen2.5:7b", "messages": [
        {"role": "system", "content": f"Answer using these search results. Cite sources.\n{context}"},
        {"role": "user", "content": user_query}
    ]})
print(ollama_resp.json()["message"]["content"])

JavaScript Implementation

JavaScript
const query = "What is the current price of NVIDIA stock?";
const searchResp = await fetch("https://api.scavio.dev/api/v1/search", {
  method: "POST",
  headers: { "x-api-key": process.env.SCAVIO_API_KEY, "Content-Type": "application/json" },
  body: JSON.stringify({ query, platform: "google", limit: 5 })
});
const context = (await searchResp.json()).results.map(r => `- ${r.title}: ${r.snippet}`).join("\n");

const llmResp = await fetch("http://localhost:11434/api/chat", {
  method: "POST",
  body: JSON.stringify({ model: "qwen2.5:7b", messages: [
    { role: "system", content: `Answer using these search results. Cite sources.\n${context}` },
    { role: "user", content: query }
  ]})
});
console.log((await llmResp.json()).message.content);

Platforms Used

Google

Web search with knowledge graph, PAA, and AI overviews

Frequently Asked Questions

Local LLMs hallucinate on current events. This workflow injects live SERP results into the system prompt before each query, grounding the model's response in real data. Works with Ollama, vLLM, llama.cpp, or any OpenAI-compatible local endpoint.

This workflow uses a per user query to local llm. Per user query.

This workflow uses the following Scavio platforms: google. Each platform is called via the same unified API endpoint.

Yes. Scavio's free tier includes 500 credits per month with no credit card required. That is enough to test and validate this workflow before scaling it.

Local LLM Grounding Pipeline Workflow

Ground a local LLM (Ollama, vLLM) with live search results: query Scavio, inject SERP context into the prompt, get factual answers without fine-tuning.