Overview
Runs a small canary scraper fleet against your most critical target sites every 15 minutes. When Scavio's canary requests start hitting 403 or challenge pages, it emits a warning 30 to 90 minutes before your production scrapers fall over. Early warning gives your team time to rotate tactics before the business impact hits.
Trigger
Cron schedule (every 15 minutes)
Schedule
Every 15 minutes
Workflow Steps
Maintain canary target list
Critical sites your production scrapers depend on (e.g., 10 to 50 targets).
Scavio test query per target
Run a cheap Scavio request at the target and inspect response status and body.
Detect CF fingerprints
Flag presence of cf-ray, challenge-platform, or cf-chl-bypass tokens in response.
Score trend
Rolling 1-hour window; alert if >30% of canaries show CF challenges.
PagerDuty alert
Fire an alert to the scraping on-call rotation with affected targets.
Log to timeseries DB
Persist every probe to InfluxDB for post-mortem timelines.
Python Implementation
import os, requests, time
API_KEY = os.environ["SCAVIO_API_KEY"]
H = {"x-api-key": API_KEY}
TARGETS = ["example.com", "news-site.com"]
def probe(domain):
r = requests.post("https://api.scavio.dev/api/v1/search",
headers=H, json={"query": f"site:{domain}"}, timeout=10)
body = r.text.lower()
blocked = any(tok in body for tok in ["cf-ray", "challenge-platform", "cf-chl-bypass"])
return {"domain": domain, "blocked": blocked, "status": r.status_code}
results = [probe(d) for d in TARGETS]
blocked_pct = sum(1 for r in results if r["blocked"]) / len(results)
if blocked_pct > 0.3:
print("ALERT: Cloudflare block wave detected", blocked_pct)JavaScript Implementation
const API_KEY = process.env.SCAVIO_API_KEY;
const H = { "x-api-key": API_KEY, "content-type": "application/json" };
const TARGETS = ["example.com", "news-site.com"];
async function probe(domain) {
const r = await fetch("https://api.scavio.dev/api/v1/search", {
method: "POST", headers: H, body: JSON.stringify({ query: "site:" + domain })
});
const body = (await r.text()).toLowerCase();
const blocked = ["cf-ray", "challenge-platform", "cf-chl-bypass"].some(t => body.includes(t));
return { domain, blocked, status: r.status };
}
const results = await Promise.all(TARGETS.map(probe));
const pct = results.filter(r => r.blocked).length / results.length;
if (pct > 0.3) console.log("ALERT: Cloudflare block wave", pct);Platforms Used
Web search with knowledge graph, PAA, and AI overviews