We Benchmarked 500 Sites Across 4 Scrapers. Here's What Won
Firecrawl vs ScrapingBee vs Bright Data vs Playwright across 500 URLs. Why the 5-point success rate gap is not decisive.
A thread on r/webscraping this week: someone ran 500 URLs through Firecrawl, ScrapingBee, Bright Data, and a plain Playwright setup and posted the success rates. The results surprised no one who has tried all four: the delta is smaller than the marketing implies, the failure modes are different, and the right choice depends on which failure mode you can live with.
The 500-URL set was a mix of news, e-commerce, JS-heavy SPAs, and Cloudflare-protected pages. Here is the summary of what actually matters when picking a scraping tool in 2026.
Success Rate by Tool
- Bright Data: 96.8% — best on protected sites, slowest, most expensive
- ScrapingBee: 94.1% — strong Cloudflare handling, mid-cost
- Firecrawl: 91.2% — great markdown output, weaker on anti-bot
- Playwright + rotation: 78.5% — cheapest, most work, breaks often
The 5-point gap between Bright Data and Firecrawl is real but not decisive. For most agent workflows, a 91% success rate with clean markdown output beats a 96% success rate with raw HTML you have to post-process.
Failure Mode Matters More Than Success Rate
What gets you into trouble with agents is not average success rate, it is predictability. If your scraper fails in a known way (timeout, 403, captcha page), your agent can retry or skip. If it succeeds in a broken way (returns a page of navigation chrome with no body), the agent writes a summary of nothing and moves on.
- Bright Data: rarely silent-fails. Returns clean errors you can branch on.
- ScrapingBee: occasional partial renders on infinite-scroll pages.
- Firecrawl: sometimes returns truncated markdown on heavy SPAs.
- Playwright: silent-fails constantly without careful configuration.
When to Use What
// Cascade pattern: cheapest first, escalate on failure
async function scrape(url: string) {
// 1. Try Firecrawl for the markdown ergonomics
const firecrawl = await tryFirecrawl(url);
if (firecrawl.ok && firecrawl.markdown.length > 500) return firecrawl;
// 2. Fall through to ScrapingBee for Cloudflare/JS-heavy
const sb = await tryScrapingBee(url, { render_js: true });
if (sb.ok) return sb;
// 3. Last resort: Bright Data for hard targets
const bd = await tryBrightData(url);
return bd;
}Where SERP APIs Fit
None of the above is a SERP API. If what you actually want is Google/Amazon/YouTube search results (not individual page scrapes), you do not need a scraper at all. Scavio, SerpAPI, Serper, Oxylabs SERP all return structured JSON for a flat per-call price. Using a scraper to simulate a SERP is 10x more expensive and 10x more fragile.
The pattern that works: SERP API for discovery, scraper for depth. SERP gives you the URL list; scraper pulls the page body.
Cost per 1000 Pages
- Bright Data: $15-35
- ScrapingBee: $5-15
- Firecrawl: $3-8
- Playwright self-hosted: $1-3 (plus dev time)
- Scavio SERP (discovery-only): $3
Full benchmark methodology and per-site breakdown is in the scraper cascade benchmark workflow. Run it monthly and re-rank your cascade.