The Problem
Pure vector RAG pipelines only retrieve from a static internal corpus. When users ask about current events, competitor pricing, public information, or anything not in the corpus, the system either hallucinates or returns 'I don't know' even though the answer is publicly available.
The Scavio Solution
Implement hybrid RAG: route each query through both a vector database (for internal docs) and a live search API (for current public data). Merge and re-rank results from both sources, then feed the combined context to the LLM. Use query classification to determine whether a query needs internal retrieval, external search, or both.
Before
Before hybrid RAG, a customer support agent had access only to internal documentation. When a customer asked about a competitor's current pricing, the agent either hallucinated an outdated number or said 'I don't have that information' even though it was publicly available on the competitor's website.
After
After adding live search, the agent queries the vector database for internal docs AND Scavio for current public data. When asked about competitor pricing, it retrieves the current pricing from a live Google search and combines it with internal competitive analysis docs to provide an accurate, comprehensive response.
Who It Is For
AI engineers building RAG-powered assistants that need to answer questions about both internal documentation and current public information.
Key Benefits
- Answers questions about current events and public data
- Combines proprietary knowledge with live market data
- Reduces hallucination for time-sensitive queries
- Query routing avoids unnecessary search API calls for internal-only questions
- Structured search results integrate cleanly with RAG context windows
Python Example
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
def hybrid_retrieve(query: str, vector_db, k: int = 5) -> list[dict]:
# Step 1: Classify query
needs_external = _needs_live_search(query)
# Step 2: Always retrieve from vector DB
internal_docs = vector_db.similarity_search(query, k=k)
context = [{'source': 'internal', 'text': doc.page_content} for doc in internal_docs]
# Step 3: Optionally add live search results
if needs_external:
r = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
json={'platform': 'google', 'query': query}, timeout=10).json()
for result in r.get('organic', [])[:3]:
context.append({
'source': 'web', 'title': result.get('title'),
'text': result.get('snippet'), 'url': result.get('link')
})
return context
def _needs_live_search(query: str) -> bool:
live_signals = ['current', 'latest', 'price', '2026', 'today', 'now', 'competitor']
return any(s in query.lower() for s in live_signals)JavaScript Example
async function hybridRetrieve(query, vectorDb, k = 5) {
const needsExternal = /current|latest|price|2026|today|competitor/i.test(query);
const internalDocs = await vectorDb.similaritySearch(query, k);
const context = internalDocs.map(doc => ({ source: 'internal', text: doc.pageContent }));
if (needsExternal) {
const r = await fetch('https://api.scavio.dev/api/v1/search', {
method: 'POST',
headers: { 'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json' },
body: JSON.stringify({ platform: 'google', query })
}).then(r => r.json());
for (const result of (r.organic || []).slice(0, 3)) {
context.push({ source: 'web', title: result.title, text: result.snippet, url: result.link });
}
}
return context;
}Platforms Used
Web search with knowledge graph, PAA, and AI overviews
Community, posts & threaded comments from any subreddit
YouTube
Video search with transcripts and metadata
Amazon
Product search with prices, ratings, and reviews