Workflow

Scrape Success Rate Tracker

Benchmark your scraping stack weekly across a fixed set of target sites using Scavio as the canonical baseline.

Overview

Runs a weekly success-rate benchmark against N representative target sites. Each site gets a Scavio request and a request via your own scraper; both attempts get logged with status, block indicators, and latency. Produces a weekly report showing your scraper's success rate vs Scavio's baseline so you know when to outsource a target.

Trigger

Cron schedule (weekly on Sunday at 2 AM UTC)

Schedule

Weekly on Sundays at 2 AM UTC

Workflow Steps

1

Load target site list

10 to 100 representative sites your team scrapes regularly.

2

Scavio baseline request

Call Scavio site:domain for each target and record success/fail.

3

Own-scraper test request

Run the same query via your internal scraper and record the outcome.

4

Compute delta

Per site, compute (scavio_success - own_success) and flag negative outliers.

5

Persist to warehouse

Write weekly row to BigQuery with {site, scavio_ok, own_ok, latency_ms, date}.

6

Email weekly report

Send CSV summary to engineering lead every Sunday night.

Python Implementation

Python
import os, requests, time
API_KEY = os.environ["SCAVIO_API_KEY"]
H = {"x-api-key": API_KEY}
SITES = ["cnn.com", "walmart.com", "reddit.com"]

def scavio_probe(site):
    r = requests.post("https://api.scavio.dev/api/v1/search",
        headers=H, json={"query": f"site:{site}"}, timeout=15)
    return r.ok and len(r.json().get("organic_results", [])) > 0

def own_probe(site):
    try:
        return requests.get(f"https://{site}", timeout=10).ok
    except: return False

for s in SITES:
    print(s, scavio_probe(s), own_probe(s))

JavaScript Implementation

JavaScript
const API_KEY = process.env.SCAVIO_API_KEY;
const H = { "x-api-key": API_KEY, "content-type": "application/json" };
const SITES = ["cnn.com", "walmart.com", "reddit.com"];

async function scavioProbe(site) {
  const r = await fetch("https://api.scavio.dev/api/v1/search", {
    method: "POST", headers: H, body: JSON.stringify({ query: "site:" + site })
  });
  if (!r.ok) return false;
  return ((await r.json()).organic_results || []).length > 0;
}

async function ownProbe(site) {
  try { const r = await fetch("https://" + site, { signal: AbortSignal.timeout(10000) }); return r.ok; }
  catch { return false; }
}

for (const s of SITES) console.log(s, await scavioProbe(s), await ownProbe(s));

Platforms Used

Google

Web search with knowledge graph, PAA, and AI overviews

Frequently Asked Questions

Runs a weekly success-rate benchmark against N representative target sites. Each site gets a Scavio request and a request via your own scraper; both attempts get logged with status, block indicators, and latency. Produces a weekly report showing your scraper's success rate vs Scavio's baseline so you know when to outsource a target.

This workflow uses a cron schedule (weekly on sunday at 2 am utc). Weekly on Sundays at 2 AM UTC.

This workflow uses the following Scavio platforms: google. Each platform is called via the same unified API endpoint.

Yes. Scavio's free tier includes 500 credits per month with no credit card required. That is enough to test and validate this workflow before scaling it.

Scrape Success Rate Tracker

Benchmark your scraping stack weekly across a fixed set of target sites using Scavio as the canonical baseline.