Tutorial

How to Ground a Voice Agent with a Search API

Ground your voice agent in real-time facts by adding search API calls before response generation. Reduces hallucination in phone and chat voice bots.

Ground a voice agent in real-time facts by inserting a search API call between the user's spoken query and the response generation step. Voice agents without grounding confidently state outdated or fabricated information because they have no mechanism to verify facts against current sources. Adding a search step takes the transcribed user query, retrieves the top results from Google or other platforms, and injects them as context into the response prompt. This tutorial builds the grounding layer for both VAPI-style and custom voice pipelines using the Scavio API.

Prerequisites

  • A voice agent platform (VAPI, Bland, Retell, or custom)
  • Python 3.8+ or Node.js 18+ installed
  • A Scavio API key from scavio.dev
  • Basic understanding of voice agent architecture (STT -> LLM -> TTS)

Walkthrough

Step 1: Build the grounding search function

Create a fast search function optimized for voice latency: timeout of 5 seconds, only top 3 results, pruned to essential fields.

Python
import requests, os

API_KEY = os.environ['SCAVIO_API_KEY']

def ground_search(query: str, max_results: int = 3) -> str:
    """Fast search optimized for voice agent grounding."""
    try:
        resp = requests.post('https://api.scavio.dev/api/v1/search',
            headers={'x-api-key': API_KEY},
            json={'platform': 'google', 'query': query}, timeout=5)
        results = resp.json().get('organic_results', [])[:max_results]
        context = []
        for r in results:
            context.append(f"{r.get('title', '')}: {r.get('snippet', '')}")
        return '\n'.join(context)
    except Exception:
        return ''  # Fail silently to avoid blocking the voice response

Step 2: Classify queries that need grounding

Not every voice query needs a web search. Classify which ones benefit from grounding to save latency on simple responses.

Python
GROUND_PATTERNS = [
    'what is', 'how much', 'when did', 'where is', 'who is',
    'latest', 'current', 'today', 'price', 'hours', 'open',
    'weather', 'news', 'score', 'status', 'schedule',
]

def needs_grounding(transcript: str) -> bool:
    text = transcript.lower().strip()
    if '?' in text:
        return True
    return any(p in text for p in GROUND_PATTERNS)

# Examples:
print(needs_grounding('What time does Target close today?'))  # True
print(needs_grounding('Thanks, that sounds good'))  # False

Step 3: Inject grounding context into the LLM prompt

Build the response prompt that includes grounding context when available, with instructions to prefer search data over training knowledge.

Python
def build_grounded_prompt(transcript: str, system_prompt: str = '') -> str:
    prompt = system_prompt + '\n\n' if system_prompt else ''
    if needs_grounding(transcript):
        context = ground_search(transcript)
        if context:
            prompt += f'LIVE SEARCH CONTEXT (prefer this over your training data):\n{context}\n\n'
    prompt += f'User said: {transcript}\n'
    prompt += 'Respond naturally and conversationally. Keep it under 3 sentences for voice delivery.'
    return prompt

# Example:
prompt = build_grounded_prompt(
    'What are the current gas prices in Austin?',
    'You are a helpful voice assistant.'
)
print(prompt)

Step 4: Integrate with your voice pipeline

Insert the grounding step into your voice agent's processing pipeline, between speech-to-text and LLM generation.

Python
# Voice pipeline integration:
# STT -> grounding_middleware -> LLM -> TTS

def voice_middleware(transcript: str, voice_config: dict) -> dict:
    """Middleware that adds grounding to voice agent responses."""
    prompt = build_grounded_prompt(transcript, voice_config.get('system_prompt', ''))
    grounded = needs_grounding(transcript)
    return {
        'prompt': prompt,
        'grounded': grounded,
        'transcript': transcript,
    }

# VAPI-style webhook handler:
def handle_vapi_webhook(payload: dict) -> dict:
    transcript = payload.get('transcript', '')
    result = voice_middleware(transcript, {'system_prompt': 'You are a helpful assistant.'})
    return {'prompt': result['prompt']}

# Test:
result = voice_middleware('What is the current price of Bitcoin?', {'system_prompt': 'You are a crypto assistant.'})
print(f'Grounded: {result["grounded"]}')
print(result['prompt'][:300])

Python Example

Python
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def ground(query):
    try:
        data = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
            json={'platform': 'google', 'query': query}, timeout=5).json()
        return '\n'.join(f"{r['title']}: {r.get('snippet', '')}" for r in data.get('organic_results', [])[:3])
    except: return ''

def voice_prompt(transcript):
    context = ground(transcript)
    return f'Context:\n{context}\n\nUser: {transcript}\nRespond in 2-3 sentences.' if context else f'User: {transcript}\nRespond in 2-3 sentences.'

print(voice_prompt('What are gas prices in Austin today?'))

JavaScript Example

JavaScript
const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
async function ground(query) {
  try {
    const r = await fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST', headers: H, body: JSON.stringify({platform: 'google', query}),
      signal: AbortSignal.timeout(5000)
    });
    const results = (await r.json()).organic_results || [];
    return results.slice(0, 3).map(r => `${r.title}: ${r.snippet || ''}`).join('\n');
  } catch { return ''; }
}
async function voicePrompt(transcript) {
  const ctx = await ground(transcript);
  return ctx ? `Context:\n${ctx}\n\nUser: ${transcript}` : `User: ${transcript}`;
}
voicePrompt('What are gas prices in Austin?').then(console.log);

Expected Output

JSON
A grounding layer for voice agents that classifies queries, runs fast search lookups, and injects live context into the LLM prompt before response generation.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

A voice agent platform (VAPI, Bland, Retell, or custom). Python 3.8+ or Node.js 18+ installed. A Scavio API key from scavio.dev. Basic understanding of voice agent architecture (STT -> LLM -> TTS). A Scavio API key gives you 500 free credits per month.

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Ground your voice agent in real-time facts by adding search API calls before response generation. Reduces hallucination in phone and chat voice bots.