How long does this benchmark scrapers by success rate across 500 sites tutorial take?

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

What do I need before starting?

Python 3.10+. A Scavio API key. A candidate scraper (your own or a competitor's). A 500-URL test list. A Scavio API key gives you 50 free credits on signup.

Can I run this tutorial with the free tier?

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

What frameworks does this work with?

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Benchmark Scrapers by Success Rate (2026)

Scraper success rate is the only metric that matters: the percent of target sites where you get clean, structured data. This tutorial runs a 500-site benchmark that hits a candidate scraper first, then Scavio as ground truth, and reports win rate, Cloudflare blocks, and empty responses.

Prerequisites

Python 3.10+
A Scavio API key
A candidate scraper (your own or a competitor's)
A 500-URL test list

Walkthrough

Step 1: Build the URL panel

500 URLs across Cloudflare-protected, JS-heavy, and simple static sites.

Python

import csv
with open('panel.csv') as f:
    URLS = [row[0] for row in csv.reader(f)]

Step 2: Define the benchmark loop

Hit candidate first, then Scavio, record outcomes.

Python

import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']

def candidate_scrape(url):
    try:
        return requests.get(url, timeout=10).text
    except: return ''

def scavio_scrape(url):
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': url, 'platform': 'extract'})
    return r.json().get('html', '')

Step 3: Score each outcome

Clean HTML with content = success.

Python

def is_success(html):
    return len(html) > 500 and '<body' in html.lower()

Step 4: Run the benchmark

Collect per-URL pass/fail for each tool.

Python

results = []
for u in URLS:
    cand = is_success(candidate_scrape(u))
    scav = is_success(scavio_scrape(u))
    results.append({'url': u, 'candidate': cand, 'scavio': scav})

Step 5: Publish the results

Success rate, Cloudflare block rate, and gap.

Python

def summarize(r):
    n = len(r)
    return {
        'candidate_rate': sum(1 for x in r if x['candidate']) / n,
        'scavio_rate': sum(1 for x in r if x['scavio']) / n
    }
print(summarize(results))

Python Example

Python

import os, requests

API_KEY = os.environ['SCAVIO_API_KEY']
URLS = ['https://example.com', 'https://cloudflare-protected.com']

def scavio_extract(url):
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': url, 'platform': 'extract'})
    return r.json().get('html', '')

wins = sum(1 for u in URLS if len(scavio_extract(u)) > 500)
print(f'Scavio success: {wins}/{len(URLS)}')

JavaScript Example

JavaScript

const API_KEY = process.env.SCAVIO_API_KEY;
const URLS = ['https://example.com', 'https://cloudflare-protected.com'];

async function scavioExtract(url) {
  const r = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST',
    headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({ query: url, platform: 'extract' })
  });
  const d = await r.json();
  return d.html || '';
}

let wins = 0;
for (const u of URLS) if ((await scavioExtract(u)).length > 500) wins++;
console.log(`Scavio success: ${wins}/${URLS.length}`);

Expected Output

JSON

Per-tool success rate (e.g., candidate 62%, Scavio 94%), Cloudflare block rate breakdown, and per-URL diff. Typical benchmark run: 20-30 minutes for 500 URLs.

Prerequisites

Python 3.10+
A Scavio API key
A candidate scraper (your own or a competitor's)
A 500-URL test list

Walkthrough

Step 1: Build the URL panel

500 URLs across Cloudflare-protected, JS-heavy, and simple static sites.

Python

import csv
with open('panel.csv') as f:
    URLS = [row[0] for row in csv.reader(f)]

Step 2: Define the benchmark loop

Hit candidate first, then Scavio, record outcomes.

Python

import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']

def candidate_scrape(url):
    try:
        return requests.get(url, timeout=10).text
    except: return ''

def scavio_scrape(url):
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': url, 'platform': 'extract'})
    return r.json().get('html', '')

Step 3: Score each outcome

Clean HTML with content = success.

Python

def is_success(html):
    return len(html) > 500 and '<body' in html.lower()

Step 4: Run the benchmark

Collect per-URL pass/fail for each tool.

Python

results = []
for u in URLS:
    cand = is_success(candidate_scrape(u))
    scav = is_success(scavio_scrape(u))
    results.append({'url': u, 'candidate': cand, 'scavio': scav})

Step 5: Publish the results

Success rate, Cloudflare block rate, and gap.

Python

def summarize(r):
    n = len(r)
    return {
        'candidate_rate': sum(1 for x in r if x['candidate']) / n,
        'scavio_rate': sum(1 for x in r if x['scavio']) / n
    }
print(summarize(results))

Python Example

Python

import os, requests

API_KEY = os.environ['SCAVIO_API_KEY']
URLS = ['https://example.com', 'https://cloudflare-protected.com']

def scavio_extract(url):
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': url, 'platform': 'extract'})
    return r.json().get('html', '')

wins = sum(1 for u in URLS if len(scavio_extract(u)) > 500)
print(f'Scavio success: {wins}/{len(URLS)}')

JavaScript Example

JavaScript

const API_KEY = process.env.SCAVIO_API_KEY;
const URLS = ['https://example.com', 'https://cloudflare-protected.com'];

async function scavioExtract(url) {
  const r = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST',
    headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({ query: url, platform: 'extract' })
  });
  const d = await r.json();
  return d.html || '';
}

let wins = 0;
for (const u of URLS) if ((await scavioExtract(u)).length > 500) wins++;
console.log(`Scavio success: ${wins}/${URLS.length}`);

Expected Output

JSON

Per-tool success rate (e.g., candidate 62%, Scavio 94%), Cloudflare block rate breakdown, and per-URL diff. Typical benchmark run: 20-30 minutes for 500 URLs.

How to Benchmark Scrapers by Success Rate Across 500 Sites

Prerequisites

Walkthrough

Step 1: Build the URL panel

Step 2: Define the benchmark loop

Step 3: Score each outcome

Step 4: Run the benchmark

Step 5: Publish the results

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this benchmark scrapers by success rate across 500 sites tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Scraping Reliability Benchmark

Best API for Scraping Reliability in 2026

Scrape Success Rate Tracker

Scavio vs Firecrawl

Best Cloudflare-Resilient Search APIs in 2026

Tavily to Scavio Migration for Agent Workflows

Start Building

How to Benchmark Scrapers by Success Rate Across 500 Sites

Prerequisites

Walkthrough

Step 1: Build the URL panel

Step 2: Define the benchmark loop

Step 3: Score each outcome

Step 4: Run the benchmark

Step 5: Publish the results

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this benchmark scrapers by success rate across 500 sites tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Scraping Reliability Benchmark

Best API for Scraping Reliability in 2026

Scrape Success Rate Tracker

Scavio vs Firecrawl

Best Cloudflare-Resilient Search APIs in 2026

Tavily to Scavio Migration for Agent Workflows

Start Building