How long does this monitor ai agents in production with web data tutorial take?

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

What do I need before starting?

Python 3.10+. Scavio API key. DuckDB or Postgres. A Scavio API key gives you 500 free credits per month.

Can I run this tutorial with the free tier?

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

What frameworks does this work with?

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Monitor AI Agents with Web Data (2026)

Agents in production drift: model updates, prompt edits, tool changes. Most monitoring catches latency and cost but misses correctness. This tutorial wires Scavio as the baseline source: every N agent answers, compare against a fresh web search and flag drift.

Prerequisites

Python 3.10+
Scavio API key
DuckDB or Postgres

Walkthrough

Step 1: Sample 5% of agent answers

Random sample at the agent boundary.

Python

import random
def sample(answer):
    return random.random() < 0.05  # sample 5%

Step 2: Pull baseline web answers

Scavio SERP plus AI Overviews for the user query.

Python

import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']

def baseline(query):
    r = requests.post('https://api.scavio.dev/api/v1/google',
        headers={'x-api-key': API_KEY},
        json={'query': query, 'include_ai_overview': True})
    return r.json()

Step 3: Score agent vs baseline

Claude judges drift on factual claims.

Python

import anthropic
client = anthropic.Anthropic()

def score_drift(query, agent_answer, baseline_data):
    msg = client.messages.create(
        model='claude-sonnet-4-6', max_tokens=200,
        messages=[{'role':'user','content':f'Q: {query}\nAgent: {agent_answer}\nBaseline: {str(baseline_data)[:3000]}\nRate factual drift 0-10:'}])
    return msg.content[0].text

Step 4: Store drift events

DuckDB row per sampled answer.

Python

import duckdb
db = duckdb.connect('drift.duckdb')
db.execute('CREATE TABLE IF NOT EXISTS drift(ts TIMESTAMP, query TEXT, score INT)')

Step 5: Alert on threshold

Pager alert when daily mean drift exceeds 4/10.

Python

def daily_check():
    avg = db.execute('SELECT AVG(score) FROM drift WHERE ts > current_date - 1').fetchone()[0]
    if avg and avg > 4:
        # send pager alert
        pass

Python Example

Python

import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']

def baseline(q):
    r = requests.post('https://api.scavio.dev/api/v1/google',
        headers={'x-api-key': API_KEY},
        json={'query': q, 'include_ai_overview': True}).json()
    return r.get('ai_overview') or r.get('organic_results',[])[:5]

print(baseline('what is mcp protocol'))

JavaScript Example

JavaScript

const API_KEY = process.env.SCAVIO_API_KEY;
export async function baseline(q) {
  const r = await fetch('https://api.scavio.dev/api/v1/google', { method:'POST', headers:{'x-api-key':API_KEY,'Content-Type':'application/json'}, body: JSON.stringify({ query: q, include_ai_overview: true }) });
  return r.json();
}

Expected Output

JSON

Daily drift score per agent; alerts fire when factual drift exceeds threshold. Engineering team catches model and prompt regressions before user reports.

How to Monitor AI Agents in Production with Web Data

Prerequisites

Walkthrough

Step 1: Sample 5% of agent answers

Step 2: Pull baseline web answers

Step 3: Score agent vs baseline

Step 4: Store drift events

Step 5: Alert on threshold

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this monitor ai agents in production with web data tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Start Building