ScavioScavio
ProductPricingDocs
Sign InGet Started
  1. Home
  2. Tutorials
  3. How to Benchmark Scrapers by Success Rate Across 500 Sites
Tutorial

How to Benchmark Scrapers by Success Rate Across 500 Sites

Benchmark your scraper's success rate across 500 real sites with Scavio acting as the ground-truth fallback.

Get Free API KeyAPI Docs

Scraper success rate is the only metric that matters: the percent of target sites where you get clean, structured data. This tutorial runs a 500-site benchmark that hits a candidate scraper first, then Scavio as ground truth, and reports win rate, Cloudflare blocks, and empty responses.

Prerequisites

  • Python 3.10+
  • A Scavio API key
  • A candidate scraper (your own or a competitor's)
  • A 500-URL test list

Walkthrough

Step 1: Build the URL panel

500 URLs across Cloudflare-protected, JS-heavy, and simple static sites.

Python
import csv
with open('panel.csv') as f:
    URLS = [row[0] for row in csv.reader(f)]

Step 2: Define the benchmark loop

Hit candidate first, then Scavio, record outcomes.

Python
import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']

def candidate_scrape(url):
    try:
        return requests.get(url, timeout=10).text
    except: return ''

def scavio_scrape(url):
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': url, 'platform': 'extract'})
    return r.json().get('html', '')

Step 3: Score each outcome

Clean HTML with content = success.

Python
def is_success(html):
    return len(html) > 500 and '<body' in html.lower()

Step 4: Run the benchmark

Collect per-URL pass/fail for each tool.

Python
results = []
for u in URLS:
    cand = is_success(candidate_scrape(u))
    scav = is_success(scavio_scrape(u))
    results.append({'url': u, 'candidate': cand, 'scavio': scav})

Step 5: Publish the results

Success rate, Cloudflare block rate, and gap.

Python
def summarize(r):
    n = len(r)
    return {
        'candidate_rate': sum(1 for x in r if x['candidate']) / n,
        'scavio_rate': sum(1 for x in r if x['scavio']) / n
    }
print(summarize(results))

Python Example

Python
import os, requests

API_KEY = os.environ['SCAVIO_API_KEY']
URLS = ['https://example.com', 'https://cloudflare-protected.com']

def scavio_extract(url):
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': url, 'platform': 'extract'})
    return r.json().get('html', '')

wins = sum(1 for u in URLS if len(scavio_extract(u)) > 500)
print(f'Scavio success: {wins}/{len(URLS)}')

JavaScript Example

JavaScript
const API_KEY = process.env.SCAVIO_API_KEY;
const URLS = ['https://example.com', 'https://cloudflare-protected.com'];

async function scavioExtract(url) {
  const r = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST',
    headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({ query: url, platform: 'extract' })
  });
  const d = await r.json();
  return d.html || '';
}

let wins = 0;
for (const u of URLS) if ((await scavioExtract(u)).length > 500) wins++;
console.log(`Scavio success: ${wins}/${URLS.length}`);

Expected Output

JSON
Per-tool success rate (e.g., candidate 62%, Scavio 94%), Cloudflare block rate breakdown, and per-URL diff. Typical benchmark run: 20-30 minutes for 500 URLs.

Related Tutorials

  • How to Detect Cloudflare Blocking
  • How to Fetch Web Search Data Without Managing Proxies
  • How to Handle Cloudflare Turnstile Challenges

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.10+. A Scavio API key. A candidate scraper (your own or a competitor's). A 500-URL test list. A Scavio API key gives you 50 free credits on signup.

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Related Resources

Use Case

Scraping Reliability Benchmark

Read more
Best Of

Best API for Scraping Reliability in 2026

Read more
Workflow

Scrape Success Rate Tracker

Read more
Comparison

Scavio vs Firecrawl

Read more
Best Of

Best Cloudflare-Resilient Search APIs in 2026

Read more
Use Case

Tavily to Scavio Migration for Agent Workflows

Read more

Start Building

Benchmark your scraper's success rate across 500 real sites with Scavio acting as the ground-truth fallback.

Get Free API KeyRead the Docs
ScavioScavio

Real-time search API for AI agents. Search every platform, not just Google.

Product

  • Features
  • Pricing
  • Dashboard
  • Affiliates

Developers

  • Documentation
  • API Reference
  • Quickstart
  • MCP Integration
  • Python SDK

Alternatives

  • Tavily Alternative
  • SerpAPI Alternative
  • Firecrawl Alternative
  • Exa Alternative

Tools

  • JSON Formatter
  • cURL to Code
  • Token Counter
  • All Tools

© 2026 Scavio. All rights reserved.

Featured on TAAFT
Terms of ServicePrivacy Policy