How is this workflow triggered?

This workflow uses a cron schedule (weekly on sunday at 2 am utc). Weekly on Sundays at 2 AM UTC.

Which Scavio platforms does this workflow use?

This workflow uses the following Scavio platforms: google. Each platform is called via the same unified API endpoint.

Can I run this workflow on the free tier?

Yes. Scavio's free tier includes 50 credits on signup with no credit card required. That is enough to test and validate this workflow before scaling it.

Scrape Success Rate Tracker Workflow

Overview

Runs a weekly success-rate benchmark against N representative target sites. Each site gets a Scavio request and a request via your own scraper; both attempts get logged with status, block indicators, and latency. Produces a weekly report showing your scraper's success rate vs Scavio's baseline so you know when to outsource a target.

Trigger

Cron schedule (weekly on Sunday at 2 AM UTC)

Schedule

Weekly on Sundays at 2 AM UTC

Workflow Steps

Load target site list

10 to 100 representative sites your team scrapes regularly.

Scavio baseline request

Call Scavio site:domain for each target and record success/fail.

Own-scraper test request

Run the same query via your internal scraper and record the outcome.

Compute delta

Per site, compute (scavio_success - own_success) and flag negative outliers.

Persist to warehouse

Write weekly row to BigQuery with {site, scavio_ok, own_ok, latency_ms, date}.

Email weekly report

Send CSV summary to engineering lead every Sunday night.

Python Implementation

Python

import os, requests, time
API_KEY = os.environ["SCAVIO_API_KEY"]
H = {"x-api-key": API_KEY}
SITES = ["cnn.com", "walmart.com", "reddit.com"]

def scavio_probe(site):
    r = requests.post("https://api.scavio.dev/api/v1/search",
        headers=H, json={"query": f"site:{site}"}, timeout=15)
    return r.ok and len(r.json().get("organic_results", [])) > 0

def own_probe(site):
    try:
        return requests.get(f"https://{site}", timeout=10).ok
    except: return False

for s in SITES:
    print(s, scavio_probe(s), own_probe(s))

JavaScript Implementation

JavaScript

const API_KEY = process.env.SCAVIO_API_KEY;
const H = { "x-api-key": API_KEY, "content-type": "application/json" };
const SITES = ["cnn.com", "walmart.com", "reddit.com"];

async function scavioProbe(site) {
  const r = await fetch("https://api.scavio.dev/api/v1/search", {
    method: "POST", headers: H, body: JSON.stringify({ query: "site:" + site })
  });
  if (!r.ok) return false;
  return ((await r.json()).organic_results || []).length > 0;
}

async function ownProbe(site) {
  try { const r = await fetch("https://" + site, { signal: AbortSignal.timeout(10000) }); return r.ok; }
  catch { return false; }
}

for (const s of SITES) console.log(s, await scavioProbe(s), await ownProbe(s));

Platforms Used

Google

Web search with knowledge graph, PAA, and AI overviews

Overview

Trigger

Cron schedule (weekly on Sunday at 2 AM UTC)

Schedule

Weekly on Sundays at 2 AM UTC

Workflow Steps

Load target site list

10 to 100 representative sites your team scrapes regularly.

Scavio baseline request

Call Scavio site:domain for each target and record success/fail.

Own-scraper test request

Run the same query via your internal scraper and record the outcome.

Compute delta

Per site, compute (scavio_success - own_success) and flag negative outliers.

Persist to warehouse

Write weekly row to BigQuery with {site, scavio_ok, own_ok, latency_ms, date}.

Email weekly report

Send CSV summary to engineering lead every Sunday night.

Python Implementation

Python

import os, requests, time
API_KEY = os.environ["SCAVIO_API_KEY"]
H = {"x-api-key": API_KEY}
SITES = ["cnn.com", "walmart.com", "reddit.com"]

def scavio_probe(site):
    r = requests.post("https://api.scavio.dev/api/v1/search",
        headers=H, json={"query": f"site:{site}"}, timeout=15)
    return r.ok and len(r.json().get("organic_results", [])) > 0

def own_probe(site):
    try:
        return requests.get(f"https://{site}", timeout=10).ok
    except: return False

for s in SITES:
    print(s, scavio_probe(s), own_probe(s))

JavaScript Implementation

JavaScript

const API_KEY = process.env.SCAVIO_API_KEY;
const H = { "x-api-key": API_KEY, "content-type": "application/json" };
const SITES = ["cnn.com", "walmart.com", "reddit.com"];

async function scavioProbe(site) {
  const r = await fetch("https://api.scavio.dev/api/v1/search", {
    method: "POST", headers: H, body: JSON.stringify({ query: "site:" + site })
  });
  if (!r.ok) return false;
  return ((await r.json()).organic_results || []).length > 0;
}

async function ownProbe(site) {
  try { const r = await fetch("https://" + site, { signal: AbortSignal.timeout(10000) }); return r.ok; }
  catch { return false; }
}

for (const s of SITES) console.log(s, await scavioProbe(s), await ownProbe(s));

Platforms Used

Google

Web search with knowledge graph, PAA, and AI overviews

Scrape Success Rate Tracker

Overview

Trigger

Schedule

Workflow Steps

Load target site list

Scavio baseline request

Own-scraper test request

Compute delta

Persist to warehouse

Email weekly report

Python Implementation

JavaScript Implementation

Platforms Used

Google

Frequently Asked Questions

What does the Scrape Success Rate Tracker workflow do?

How is this workflow triggered?

Which Scavio platforms does this workflow use?

Can I run this workflow on the free tier?