We Benchmarked 500 Sites Across 4 Scrapers. Here's What Won

A thread on r/webscraping this week: someone ran 500 URLs through Firecrawl, ScrapingBee, Bright Data, and a plain Playwright setup and posted the success rates. The results surprised no one who has tried all four: the delta is smaller than the marketing implies, the failure modes are different, and the right choice depends on which failure mode you can live with.

The 500-URL set was a mix of news, e-commerce, JS-heavy SPAs, and Cloudflare-protected pages. Here is the summary of what actually matters when picking a scraping tool in 2026.

Success Rate by Tool

Bright Data: 96.8% — best on protected sites, slowest, most expensive
ScrapingBee: 94.1% — strong Cloudflare handling, mid-cost
Firecrawl: 91.2% — great markdown output, weaker on anti-bot
Playwright + rotation: 78.5% — cheapest, most work, breaks often

The 5-point gap between Bright Data and Firecrawl is real but not decisive. For most agent workflows, a 91% success rate with clean markdown output beats a 96% success rate with raw HTML you have to post-process.

Failure Mode Matters More Than Success Rate

What gets you into trouble with agents is not average success rate, it is predictability. If your scraper fails in a known way (timeout, 403, captcha page), your agent can retry or skip. If it succeeds in a broken way (returns a page of navigation chrome with no body), the agent writes a summary of nothing and moves on.

Bright Data: rarely silent-fails. Returns clean errors you can branch on.
ScrapingBee: occasional partial renders on infinite-scroll pages.
Firecrawl: sometimes returns truncated markdown on heavy SPAs.
Playwright: silent-fails constantly without careful configuration.

When to Use What

// Cascade pattern: cheapest first, escalate on failure
async function scrape(url: string) {
  // 1. Try Firecrawl for the markdown ergonomics
  const firecrawl = await tryFirecrawl(url);
  if (firecrawl.ok && firecrawl.markdown.length > 500) return firecrawl;

  // 2. Fall through to ScrapingBee for Cloudflare/JS-heavy
  const sb = await tryScrapingBee(url, { render_js: true });
  if (sb.ok) return sb;

  // 3. Last resort: Bright Data for hard targets
  const bd = await tryBrightData(url);
  return bd;
}

Where SERP APIs Fit

None of the above is a SERP API. If what you actually want is Google/Amazon/YouTube search results (not individual page scrapes), you do not need a scraper at all. Scavio, SerpAPI, Serper, Oxylabs SERP all return structured JSON for a flat per-call price. Using a scraper to simulate a SERP is 10x more expensive and 10x more fragile.

The pattern that works: SERP API for discovery, scraper for depth. SERP gives you the URL list; scraper pulls the page body.

Cost per 1000 Pages

Bright Data: $15-35
ScrapingBee: $5-15
Firecrawl: $3-8
Playwright self-hosted: $1-3 (plus dev time)
Scavio SERP (discovery-only): $3

Full benchmark methodology and per-site breakdown is in the scraper cascade benchmark workflow . Run it monthly and re-rank your cascade.

Success Rate by Tool

Bright Data: 96.8% — best on protected sites, slowest, most expensive

ScrapingBee: 94.1% — strong Cloudflare handling, mid-cost

Firecrawl: 91.2% — great markdown output, weaker on anti-bot

Playwright + rotation: 78.5% — cheapest, most work, breaks often

Failure Mode Matters More Than Success Rate

Bright Data: rarely silent-fails. Returns clean errors you can branch on.

ScrapingBee: occasional partial renders on infinite-scroll pages.

Firecrawl: sometimes returns truncated markdown on heavy SPAs.

Playwright: silent-fails constantly without careful configuration.

When to Use What

// Cascade pattern: cheapest first, escalate on failure
async function scrape(url: string) {
  // 1. Try Firecrawl for the markdown ergonomics
  const firecrawl = await tryFirecrawl(url);
  if (firecrawl.ok && firecrawl.markdown.length > 500) return firecrawl;

  // 2. Fall through to ScrapingBee for Cloudflare/JS-heavy
  const sb = await tryScrapingBee(url, { render_js: true });
  if (sb.ok) return sb;

  // 3. Last resort: Bright Data for hard targets
  const bd = await tryBrightData(url);
  return bd;
}

Where SERP APIs Fit

The pattern that works: SERP API for discovery, scraper for depth. SERP gives you the URL list; scraper pulls the page body.

We Benchmarked 500 Sites Across 4 Scrapers. Here's What Won

Success Rate by Tool

Failure Mode Matters More Than Success Rate

When to Use What

Where SERP APIs Fit

Cost per 1000 Pages

Continue reading

AEO Tracking for D2C Ecommerce Brands in 2026

Agent Discovery vs Extraction: Why Cost Split Matters

We Benchmarked 500 Sites Across 4 Scrapers. Here's What Won

Success Rate by Tool

Failure Mode Matters More Than Success Rate

When to Use What

Where SERP APIs Fit

Cost per 1000 Pages

Continue reading

AEO Tracking for D2C Ecommerce Brands in 2026

Agent Discovery vs Extraction: Why Cost Split Matters