Ground a voice agent in real-time facts by inserting a search API call between the user's spoken query and the response generation step. Voice agents without grounding confidently state outdated or fabricated information because they have no mechanism to verify facts against current sources. Adding a search step takes the transcribed user query, retrieves the top results from Google or other platforms, and injects them as context into the response prompt. This tutorial builds the grounding layer for both VAPI-style and custom voice pipelines using the Scavio API.
Prerequisites
- A voice agent platform (VAPI, Bland, Retell, or custom)
- Python 3.8+ or Node.js 18+ installed
- A Scavio API key from scavio.dev
- Basic understanding of voice agent architecture (STT -> LLM -> TTS)
Walkthrough
Step 1: Build the grounding search function
Create a fast search function optimized for voice latency: timeout of 5 seconds, only top 3 results, pruned to essential fields.
import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']
def ground_search(query: str, max_results: int = 3) -> str:
"""Fast search optimized for voice agent grounding."""
try:
resp = requests.post('https://api.scavio.dev/api/v1/search',
headers={'x-api-key': API_KEY},
json={'platform': 'google', 'query': query}, timeout=5)
results = resp.json().get('organic_results', [])[:max_results]
context = []
for r in results:
context.append(f"{r.get('title', '')}: {r.get('snippet', '')}")
return '\n'.join(context)
except Exception:
return '' # Fail silently to avoid blocking the voice responseStep 2: Classify queries that need grounding
Not every voice query needs a web search. Classify which ones benefit from grounding to save latency on simple responses.
GROUND_PATTERNS = [
'what is', 'how much', 'when did', 'where is', 'who is',
'latest', 'current', 'today', 'price', 'hours', 'open',
'weather', 'news', 'score', 'status', 'schedule',
]
def needs_grounding(transcript: str) -> bool:
text = transcript.lower().strip()
if '?' in text:
return True
return any(p in text for p in GROUND_PATTERNS)
# Examples:
print(needs_grounding('What time does Target close today?')) # True
print(needs_grounding('Thanks, that sounds good')) # FalseStep 3: Inject grounding context into the LLM prompt
Build the response prompt that includes grounding context when available, with instructions to prefer search data over training knowledge.
def build_grounded_prompt(transcript: str, system_prompt: str = '') -> str:
prompt = system_prompt + '\n\n' if system_prompt else ''
if needs_grounding(transcript):
context = ground_search(transcript)
if context:
prompt += f'LIVE SEARCH CONTEXT (prefer this over your training data):\n{context}\n\n'
prompt += f'User said: {transcript}\n'
prompt += 'Respond naturally and conversationally. Keep it under 3 sentences for voice delivery.'
return prompt
# Example:
prompt = build_grounded_prompt(
'What are the current gas prices in Austin?',
'You are a helpful voice assistant.'
)
print(prompt)Step 4: Integrate with your voice pipeline
Insert the grounding step into your voice agent's processing pipeline, between speech-to-text and LLM generation.
# Voice pipeline integration:
# STT -> grounding_middleware -> LLM -> TTS
def voice_middleware(transcript: str, voice_config: dict) -> dict:
"""Middleware that adds grounding to voice agent responses."""
prompt = build_grounded_prompt(transcript, voice_config.get('system_prompt', ''))
grounded = needs_grounding(transcript)
return {
'prompt': prompt,
'grounded': grounded,
'transcript': transcript,
}
# VAPI-style webhook handler:
def handle_vapi_webhook(payload: dict) -> dict:
transcript = payload.get('transcript', '')
result = voice_middleware(transcript, {'system_prompt': 'You are a helpful assistant.'})
return {'prompt': result['prompt']}
# Test:
result = voice_middleware('What is the current price of Bitcoin?', {'system_prompt': 'You are a crypto assistant.'})
print(f'Grounded: {result["grounded"]}')
print(result['prompt'][:300])Python Example
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
def ground(query):
try:
data = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
json={'platform': 'google', 'query': query}, timeout=5).json()
return '\n'.join(f"{r['title']}: {r.get('snippet', '')}" for r in data.get('organic_results', [])[:3])
except: return ''
def voice_prompt(transcript):
context = ground(transcript)
return f'Context:\n{context}\n\nUser: {transcript}\nRespond in 2-3 sentences.' if context else f'User: {transcript}\nRespond in 2-3 sentences.'
print(voice_prompt('What are gas prices in Austin today?'))JavaScript Example
const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
async function ground(query) {
try {
const r = await fetch('https://api.scavio.dev/api/v1/search', {
method: 'POST', headers: H, body: JSON.stringify({platform: 'google', query}),
signal: AbortSignal.timeout(5000)
});
const results = (await r.json()).organic_results || [];
return results.slice(0, 3).map(r => `${r.title}: ${r.snippet || ''}`).join('\n');
} catch { return ''; }
}
async function voicePrompt(transcript) {
const ctx = await ground(transcript);
return ctx ? `Context:\n${ctx}\n\nUser: ${transcript}` : `User: ${transcript}`;
}
voicePrompt('What are gas prices in Austin?').then(console.log);Expected Output
A grounding layer for voice agents that classifies queries, runs fast search lookups, and injects live context into the LLM prompt before response generation.