Agents in production drift: model updates, prompt edits, tool changes. Most monitoring catches latency and cost but misses correctness. This tutorial wires Scavio as the baseline source: every N agent answers, compare against a fresh web search and flag drift.
Prerequisites
- Python 3.10+
- Scavio API key
- DuckDB or Postgres
Walkthrough
Step 1: Sample 5% of agent answers
Random sample at the agent boundary.
import random
def sample(answer):
return random.random() < 0.05 # sample 5%Step 2: Pull baseline web answers
Scavio SERP plus AI Overviews for the user query.
import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']
def baseline(query):
r = requests.post('https://api.scavio.dev/api/v1/google',
headers={'x-api-key': API_KEY},
json={'query': query, 'include_ai_overview': True})
return r.json()Step 3: Score agent vs baseline
Claude judges drift on factual claims.
import anthropic
client = anthropic.Anthropic()
def score_drift(query, agent_answer, baseline_data):
msg = client.messages.create(
model='claude-sonnet-4-6', max_tokens=200,
messages=[{'role':'user','content':f'Q: {query}\nAgent: {agent_answer}\nBaseline: {str(baseline_data)[:3000]}\nRate factual drift 0-10:'}])
return msg.content[0].textStep 4: Store drift events
DuckDB row per sampled answer.
import duckdb
db = duckdb.connect('drift.duckdb')
db.execute('CREATE TABLE IF NOT EXISTS drift(ts TIMESTAMP, query TEXT, score INT)')Step 5: Alert on threshold
Pager alert when daily mean drift exceeds 4/10.
def daily_check():
avg = db.execute('SELECT AVG(score) FROM drift WHERE ts > current_date - 1').fetchone()[0]
if avg and avg > 4:
# send pager alert
passPython Example
import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']
def baseline(q):
r = requests.post('https://api.scavio.dev/api/v1/google',
headers={'x-api-key': API_KEY},
json={'query': q, 'include_ai_overview': True}).json()
return r.get('ai_overview') or r.get('organic_results',[])[:5]
print(baseline('what is mcp protocol'))JavaScript Example
const API_KEY = process.env.SCAVIO_API_KEY;
export async function baseline(q) {
const r = await fetch('https://api.scavio.dev/api/v1/google', { method:'POST', headers:{'x-api-key':API_KEY,'Content-Type':'application/json'}, body: JSON.stringify({ query: q, include_ai_overview: true }) });
return r.json();
}Expected Output
Daily drift score per agent; alerts fire when factual drift exceeds threshold. Engineering team catches model and prompt regressions before user reports.