How long does this ground a voice agent with a search api tutorial take?

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

What do I need before starting?

A voice agent platform (VAPI, Bland, Retell, or custom). Python 3.8+ or Node.js 18+ installed. A Scavio API key from scavio.dev. Basic understanding of voice agent architecture (STT -> LLM -> TTS). A Scavio API key gives you 500 free credits per month.

Can I run this tutorial with the free tier?

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

What frameworks does this work with?

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Ground Voice Agent with Search API (2026)

Ground a voice agent in real-time facts by inserting a search API call between the user's spoken query and the response generation step. Voice agents without grounding confidently state outdated or fabricated information because they have no mechanism to verify facts against current sources. Adding a search step takes the transcribed user query, retrieves the top results from Google or other platforms, and injects them as context into the response prompt. This tutorial builds the grounding layer for both VAPI-style and custom voice pipelines using the Scavio API.

Prerequisites

A voice agent platform (VAPI, Bland, Retell, or custom)
Python 3.8+ or Node.js 18+ installed
A Scavio API key from scavio.dev
Basic understanding of voice agent architecture (STT -> LLM -> TTS)

Walkthrough

Step 1: Build the grounding search function

Create a fast search function optimized for voice latency: timeout of 5 seconds, only top 3 results, pruned to essential fields.

Python

import requests, os

API_KEY = os.environ['SCAVIO_API_KEY']

def ground_search(query: str, max_results: int = 3) -> str:
    """Fast search optimized for voice agent grounding."""
    try:
        resp = requests.post('https://api.scavio.dev/api/v1/search',
            headers={'x-api-key': API_KEY},
            json={'platform': 'google', 'query': query}, timeout=5)
        results = resp.json().get('organic_results', [])[:max_results]
        context = []
        for r in results:
            context.append(f"{r.get('title', '')}: {r.get('snippet', '')}")
        return '\n'.join(context)
    except Exception:
        return ''  # Fail silently to avoid blocking the voice response

Step 2: Classify queries that need grounding

Not every voice query needs a web search. Classify which ones benefit from grounding to save latency on simple responses.

Python

GROUND_PATTERNS = [
    'what is', 'how much', 'when did', 'where is', 'who is',
    'latest', 'current', 'today', 'price', 'hours', 'open',
    'weather', 'news', 'score', 'status', 'schedule',
]

def needs_grounding(transcript: str) -> bool:
    text = transcript.lower().strip()
    if '?' in text:
        return True
    return any(p in text for p in GROUND_PATTERNS)

# Examples:
print(needs_grounding('What time does Target close today?'))  # True
print(needs_grounding('Thanks, that sounds good'))  # False

Step 3: Inject grounding context into the LLM prompt

Build the response prompt that includes grounding context when available, with instructions to prefer search data over training knowledge.

Python

def build_grounded_prompt(transcript: str, system_prompt: str = '') -> str:
    prompt = system_prompt + '\n\n' if system_prompt else ''
    if needs_grounding(transcript):
        context = ground_search(transcript)
        if context:
            prompt += f'LIVE SEARCH CONTEXT (prefer this over your training data):\n{context}\n\n'
    prompt += f'User said: {transcript}\n'
    prompt += 'Respond naturally and conversationally. Keep it under 3 sentences for voice delivery.'
    return prompt

# Example:
prompt = build_grounded_prompt(
    'What are the current gas prices in Austin?',
    'You are a helpful voice assistant.'
)
print(prompt)

Step 4: Integrate with your voice pipeline

Insert the grounding step into your voice agent's processing pipeline, between speech-to-text and LLM generation.

Python

# Voice pipeline integration:
# STT -> grounding_middleware -> LLM -> TTS

def voice_middleware(transcript: str, voice_config: dict) -> dict:
    """Middleware that adds grounding to voice agent responses."""
    prompt = build_grounded_prompt(transcript, voice_config.get('system_prompt', ''))
    grounded = needs_grounding(transcript)
    return {
        'prompt': prompt,
        'grounded': grounded,
        'transcript': transcript,
    }

# VAPI-style webhook handler:
def handle_vapi_webhook(payload: dict) -> dict:
    transcript = payload.get('transcript', '')
    result = voice_middleware(transcript, {'system_prompt': 'You are a helpful assistant.'})
    return {'prompt': result['prompt']}

# Test:
result = voice_middleware('What is the current price of Bitcoin?', {'system_prompt': 'You are a crypto assistant.'})
print(f'Grounded: {result["grounded"]}')
print(result['prompt'][:300])

Python Example

Python

import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def ground(query):
    try:
        data = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
            json={'platform': 'google', 'query': query}, timeout=5).json()
        return '\n'.join(f"{r['title']}: {r.get('snippet', '')}" for r in data.get('organic_results', [])[:3])
    except: return ''

def voice_prompt(transcript):
    context = ground(transcript)
    return f'Context:\n{context}\n\nUser: {transcript}\nRespond in 2-3 sentences.' if context else f'User: {transcript}\nRespond in 2-3 sentences.'

print(voice_prompt('What are gas prices in Austin today?'))

JavaScript Example

JavaScript

const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
async function ground(query) {
  try {
    const r = await fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST', headers: H, body: JSON.stringify({platform: 'google', query}),
      signal: AbortSignal.timeout(5000)
    });
    const results = (await r.json()).organic_results || [];
    return results.slice(0, 3).map(r => `${r.title}: ${r.snippet || ''}`).join('\n');
  } catch { return ''; }
}
async function voicePrompt(transcript) {
  const ctx = await ground(transcript);
  return ctx ? `Context:\n${ctx}\n\nUser: ${transcript}` : `User: ${transcript}`;
}
voicePrompt('What are gas prices in Austin?').then(console.log);

Expected Output

JSON

A grounding layer for voice agents that classifies queries, runs fast search lookups, and injects live context into the LLM prompt before response generation.

How to Ground a Voice Agent with a Search API

Prerequisites

Walkthrough

Step 1: Build the grounding search function

Step 2: Classify queries that need grounding

Step 3: Inject grounding context into the LLM prompt

Step 4: Integrate with your voice pipeline

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this ground a voice agent with a search api tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Start Building