Workflow

Cloudflare Block Early-Warning System

Detect Cloudflare blocks on your scrapers before the full outage hits using Scavio canary requests.

Overview

Runs a small canary scraper fleet against your most critical target sites every 15 minutes. When Scavio's canary requests start hitting 403 or challenge pages, it emits a warning 30 to 90 minutes before your production scrapers fall over. Early warning gives your team time to rotate tactics before the business impact hits.

Trigger

Cron schedule (every 15 minutes)

Schedule

Every 15 minutes

Workflow Steps

1

Maintain canary target list

Critical sites your production scrapers depend on (e.g., 10 to 50 targets).

2

Scavio test query per target

Run a cheap Scavio request at the target and inspect response status and body.

3

Detect CF fingerprints

Flag presence of cf-ray, challenge-platform, or cf-chl-bypass tokens in response.

4

Score trend

Rolling 1-hour window; alert if >30% of canaries show CF challenges.

5

PagerDuty alert

Fire an alert to the scraping on-call rotation with affected targets.

6

Log to timeseries DB

Persist every probe to InfluxDB for post-mortem timelines.

Python Implementation

Python
import os, requests, time
API_KEY = os.environ["SCAVIO_API_KEY"]
H = {"x-api-key": API_KEY}
TARGETS = ["example.com", "news-site.com"]

def probe(domain):
    r = requests.post("https://api.scavio.dev/api/v1/search",
        headers=H, json={"query": f"site:{domain}"}, timeout=10)
    body = r.text.lower()
    blocked = any(tok in body for tok in ["cf-ray", "challenge-platform", "cf-chl-bypass"])
    return {"domain": domain, "blocked": blocked, "status": r.status_code}

results = [probe(d) for d in TARGETS]
blocked_pct = sum(1 for r in results if r["blocked"]) / len(results)
if blocked_pct > 0.3:
    print("ALERT: Cloudflare block wave detected", blocked_pct)

JavaScript Implementation

JavaScript
const API_KEY = process.env.SCAVIO_API_KEY;
const H = { "x-api-key": API_KEY, "content-type": "application/json" };
const TARGETS = ["example.com", "news-site.com"];

async function probe(domain) {
  const r = await fetch("https://api.scavio.dev/api/v1/search", {
    method: "POST", headers: H, body: JSON.stringify({ query: "site:" + domain })
  });
  const body = (await r.text()).toLowerCase();
  const blocked = ["cf-ray", "challenge-platform", "cf-chl-bypass"].some(t => body.includes(t));
  return { domain, blocked, status: r.status };
}

const results = await Promise.all(TARGETS.map(probe));
const pct = results.filter(r => r.blocked).length / results.length;
if (pct > 0.3) console.log("ALERT: Cloudflare block wave", pct);

Platforms Used

Google

Web search with knowledge graph, PAA, and AI overviews

Frequently Asked Questions

Runs a small canary scraper fleet against your most critical target sites every 15 minutes. When Scavio's canary requests start hitting 403 or challenge pages, it emits a warning 30 to 90 minutes before your production scrapers fall over. Early warning gives your team time to rotate tactics before the business impact hits.

This workflow uses a cron schedule (every 15 minutes). Every 15 minutes.

This workflow uses the following Scavio platforms: google. Each platform is called via the same unified API endpoint.

Yes. Scavio's free tier includes 500 credits per month with no credit card required. That is enough to test and validate this workflow before scaling it.

Cloudflare Block Early-Warning System

Detect Cloudflare blocks on your scrapers before the full outage hits using Scavio canary requests.