Tutorial

How to Replace Firecrawl for Large-Crawl Jobs

Move large crawl workloads off Firecrawl to Scavio's batch crawl endpoint. Lower cost, same markdown output, higher concurrency.

Firecrawl is fine for 10-page crawls but teams running 10K+ page weekly refreshes hit rate limits and pricing walls. This tutorial migrates a large-crawl workload to Scavio's /crawl endpoint with higher concurrency and per-page pricing.

Prerequisites

  • An existing Firecrawl workload to migrate
  • A Scavio API key (paid tier recommended for concurrency)
  • Python 3.10+

Walkthrough

Step 1: Inventory your current crawl

Export your Firecrawl seed URLs and frequency.

Python
# From Firecrawl dashboard, export crawl job config:
SEEDS = ['https://docs.site.com']
DEPTH = 3

Step 2: Queue the crawl in Scavio

Scavio returns a job_id for async polling.

Python
import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']

def start_crawl(seed, depth):
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': seed, 'platform': 'crawl', 'depth': depth, 'format': 'markdown'})
    return r.json()['job_id']

Step 3: Poll for completion

Scavio streams pages as they complete.

Python
def poll(job_id):
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': job_id, 'platform': 'crawl_status'})
    return r.json()

Step 4: Save pages as markdown

Same output format as Firecrawl, so downstream ingestion stays the same.

Python
import os
def save(pages, outdir):
    os.makedirs(outdir, exist_ok=True)
    for i, p in enumerate(pages):
        with open(f'{outdir}/page_{i}.md', 'w') as f:
            f.write(p['markdown'])

Step 5: Schedule weekly refresh

Cron or GitHub Actions kicks off the weekly crawl.

# .github/workflows/crawl.yml
on:
  schedule: [{cron: '0 4 * * 1'}]
jobs:
  crawl:
    runs-on: ubuntu-latest
    steps: [{run: python crawl.py}]

Python Example

Python
import os, requests, time

API_KEY = os.environ['SCAVIO_API_KEY']

def crawl(seed, depth=2):
    start = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': seed, 'platform': 'crawl', 'depth': depth, 'format': 'markdown'})
    job = start.json()['job_id']
    while True:
        s = requests.post('https://api.scavio.dev/api/v1/search',
            headers={'x-api-key': API_KEY},
            json={'query': job, 'platform': 'crawl_status'}).json()
        if s['status'] == 'done': return s['pages']
        time.sleep(5)

print(len(crawl('https://docs.example.com', depth=2)))

JavaScript Example

JavaScript
const API_KEY = process.env.SCAVIO_API_KEY;
async function crawl(seed, depth = 2) {
  const start = await (await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST',
    headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({ query: seed, platform: 'crawl', depth, format: 'markdown' })
  })).json();
  const job = start.job_id;
  while (true) {
    const s = await (await fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST',
      headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
      body: JSON.stringify({ query: job, platform: 'crawl_status' })
    })).json();
    if (s.status === 'done') return s.pages;
    await new Promise(r => setTimeout(r, 5000));
  }
}

Expected Output

JSON
Weekly 10K-page crawl completes in 20-40 minutes. Markdown output identical to Firecrawl. Per-page cost: 1 credit.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

An existing Firecrawl workload to migrate. A Scavio API key (paid tier recommended for concurrency). Python 3.10+. A Scavio API key gives you 500 free credits per month.

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Move large crawl workloads off Firecrawl to Scavio's batch crawl endpoint. Lower cost, same markdown output, higher concurrency.