scrapingbenchmarkfirecrawl

We Benchmarked 500 Sites Across 4 Scrapers. Here's What Won

Firecrawl vs ScrapingBee vs Bright Data vs Playwright across 500 URLs. Why the 5-point success rate gap is not decisive.

7 min read

A thread on r/webscraping this week: someone ran 500 URLs through Firecrawl, ScrapingBee, Bright Data, and a plain Playwright setup and posted the success rates. The results surprised no one who has tried all four: the delta is smaller than the marketing implies, the failure modes are different, and the right choice depends on which failure mode you can live with.

The 500-URL set was a mix of news, e-commerce, JS-heavy SPAs, and Cloudflare-protected pages. Here is the summary of what actually matters when picking a scraping tool in 2026.

Success Rate by Tool

  • Bright Data: 96.8% — best on protected sites, slowest, most expensive
  • ScrapingBee: 94.1% — strong Cloudflare handling, mid-cost
  • Firecrawl: 91.2% — great markdown output, weaker on anti-bot
  • Playwright + rotation: 78.5% — cheapest, most work, breaks often

The 5-point gap between Bright Data and Firecrawl is real but not decisive. For most agent workflows, a 91% success rate with clean markdown output beats a 96% success rate with raw HTML you have to post-process.

Failure Mode Matters More Than Success Rate

What gets you into trouble with agents is not average success rate, it is predictability. If your scraper fails in a known way (timeout, 403, captcha page), your agent can retry or skip. If it succeeds in a broken way (returns a page of navigation chrome with no body), the agent writes a summary of nothing and moves on.

  • Bright Data: rarely silent-fails. Returns clean errors you can branch on.
  • ScrapingBee: occasional partial renders on infinite-scroll pages.
  • Firecrawl: sometimes returns truncated markdown on heavy SPAs.
  • Playwright: silent-fails constantly without careful configuration.

When to Use What

// Cascade pattern: cheapest first, escalate on failure
async function scrape(url: string) {
  // 1. Try Firecrawl for the markdown ergonomics
  const firecrawl = await tryFirecrawl(url);
  if (firecrawl.ok && firecrawl.markdown.length > 500) return firecrawl;

  // 2. Fall through to ScrapingBee for Cloudflare/JS-heavy
  const sb = await tryScrapingBee(url, { render_js: true });
  if (sb.ok) return sb;

  // 3. Last resort: Bright Data for hard targets
  const bd = await tryBrightData(url);
  return bd;
}

Where SERP APIs Fit

None of the above is a SERP API. If what you actually want is Google/Amazon/YouTube search results (not individual page scrapes), you do not need a scraper at all. Scavio, SerpAPI, Serper, Oxylabs SERP all return structured JSON for a flat per-call price. Using a scraper to simulate a SERP is 10x more expensive and 10x more fragile.

The pattern that works: SERP API for discovery, scraper for depth. SERP gives you the URL list; scraper pulls the page body.

Cost per 1000 Pages

  • Bright Data: $15-35
  • ScrapingBee: $5-15
  • Firecrawl: $3-8
  • Playwright self-hosted: $1-3 (plus dev time)
  • Scavio SERP (discovery-only): $3

Full benchmark methodology and per-site breakdown is in the scraper cascade benchmark workflow. Run it monthly and re-rank your cascade.