How long does this replace firecrawl for large-crawl jobs tutorial take?

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

What do I need before starting?

An existing Firecrawl workload to migrate. A Scavio API key (paid tier recommended for concurrency). Python 3.10+. A Scavio API key gives you 50 free credits on signup.

Can I run this tutorial with the free tier?

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

What frameworks does this work with?

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Replace Firecrawl for Large Crawls (2026)

Firecrawl is fine for 10-page crawls but teams running 10K+ page weekly refreshes hit rate limits and pricing walls. This tutorial migrates a large-crawl workload to Scavio's /crawl endpoint with higher concurrency and per-page pricing.

Prerequisites

An existing Firecrawl workload to migrate
A Scavio API key (paid tier recommended for concurrency)
Python 3.10+

Walkthrough

Step 1: Inventory your current crawl

Export your Firecrawl seed URLs and frequency.

Python

# From Firecrawl dashboard, export crawl job config:
SEEDS = ['https://docs.site.com']
DEPTH = 3

Step 2: Queue the crawl in Scavio

Scavio returns a job_id for async polling.

Python

import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']

def start_crawl(seed, depth):
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': seed, 'platform': 'crawl', 'depth': depth, 'format': 'markdown'})
    return r.json()['job_id']

Step 3: Poll for completion

Scavio streams pages as they complete.

Python

def poll(job_id):
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': job_id, 'platform': 'crawl_status'})
    return r.json()

Step 4: Save pages as markdown

Same output format as Firecrawl, so downstream ingestion stays the same.

Python

import os
def save(pages, outdir):
    os.makedirs(outdir, exist_ok=True)
    for i, p in enumerate(pages):
        with open(f'{outdir}/page_{i}.md', 'w') as f:
            f.write(p['markdown'])

Step 5: Schedule weekly refresh

Cron or GitHub Actions kicks off the weekly crawl.

# .github/workflows/crawl.yml
on:
  schedule: [{cron: '0 4 * * 1'}]
jobs:
  crawl:
    runs-on: ubuntu-latest
    steps: [{run: python crawl.py}]

Python Example

Python

import os, requests, time

API_KEY = os.environ['SCAVIO_API_KEY']

def crawl(seed, depth=2):
    start = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': seed, 'platform': 'crawl', 'depth': depth, 'format': 'markdown'})
    job = start.json()['job_id']
    while True:
        s = requests.post('https://api.scavio.dev/api/v1/search',
            headers={'x-api-key': API_KEY},
            json={'query': job, 'platform': 'crawl_status'}).json()
        if s['status'] == 'done': return s['pages']
        time.sleep(5)

print(len(crawl('https://docs.example.com', depth=2)))

JavaScript Example

JavaScript

const API_KEY = process.env.SCAVIO_API_KEY;
async function crawl(seed, depth = 2) {
  const start = await (await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST',
    headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({ query: seed, platform: 'crawl', depth, format: 'markdown' })
  })).json();
  const job = start.job_id;
  while (true) {
    const s = await (await fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST',
      headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
      body: JSON.stringify({ query: job, platform: 'crawl_status' })
    })).json();
    if (s.status === 'done') return s.pages;
    await new Promise(r => setTimeout(r, 5000));
  }
}

Expected Output

JSON

Weekly 10K-page crawl completes in 20-40 minutes. Markdown output identical to Firecrawl. Per-page cost: 1 credit.

Prerequisites

An existing Firecrawl workload to migrate
A Scavio API key (paid tier recommended for concurrency)
Python 3.10+

Walkthrough

Step 1: Inventory your current crawl

Export your Firecrawl seed URLs and frequency.

Python

# From Firecrawl dashboard, export crawl job config:
SEEDS = ['https://docs.site.com']
DEPTH = 3

Step 2: Queue the crawl in Scavio

Scavio returns a job_id for async polling.

Python

import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']

def start_crawl(seed, depth):
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': seed, 'platform': 'crawl', 'depth': depth, 'format': 'markdown'})
    return r.json()['job_id']

Step 3: Poll for completion

Scavio streams pages as they complete.

Python

def poll(job_id):
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': job_id, 'platform': 'crawl_status'})
    return r.json()

Step 4: Save pages as markdown

Same output format as Firecrawl, so downstream ingestion stays the same.

Python

import os
def save(pages, outdir):
    os.makedirs(outdir, exist_ok=True)
    for i, p in enumerate(pages):
        with open(f'{outdir}/page_{i}.md', 'w') as f:
            f.write(p['markdown'])

Step 5: Schedule weekly refresh

Cron or GitHub Actions kicks off the weekly crawl.

# .github/workflows/crawl.yml
on:
  schedule: [{cron: '0 4 * * 1'}]
jobs:
  crawl:
    runs-on: ubuntu-latest
    steps: [{run: python crawl.py}]

Python Example

Python

import os, requests, time

API_KEY = os.environ['SCAVIO_API_KEY']

def crawl(seed, depth=2):
    start = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': seed, 'platform': 'crawl', 'depth': depth, 'format': 'markdown'})
    job = start.json()['job_id']
    while True:
        s = requests.post('https://api.scavio.dev/api/v1/search',
            headers={'x-api-key': API_KEY},
            json={'query': job, 'platform': 'crawl_status'}).json()
        if s['status'] == 'done': return s['pages']
        time.sleep(5)

print(len(crawl('https://docs.example.com', depth=2)))

JavaScript Example

JavaScript

const API_KEY = process.env.SCAVIO_API_KEY;
async function crawl(seed, depth = 2) {
  const start = await (await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST',
    headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({ query: seed, platform: 'crawl', depth, format: 'markdown' })
  })).json();
  const job = start.job_id;
  while (true) {
    const s = await (await fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST',
      headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
      body: JSON.stringify({ query: job, platform: 'crawl_status' })
    })).json();
    if (s.status === 'done') return s.pages;
    await new Promise(r => setTimeout(r, 5000));
  }
}

Expected Output

JSON

Weekly 10K-page crawl completes in 20-40 minutes. Markdown output identical to Firecrawl. Per-page cost: 1 credit.

How to Replace Firecrawl for Large-Crawl Jobs

Prerequisites

Walkthrough

Step 1: Inventory your current crawl

Step 2: Queue the crawl in Scavio

Step 3: Poll for completion

Step 4: Save pages as markdown

Step 5: Schedule weekly refresh

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this replace firecrawl for large-crawl jobs tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Firecrawl vs Scavio

Best DuckDuckGo API Alternatives in 2026

Best Search API as a Brave Alternative in 2026

Scavio vs Firecrawl

Migrate from Brave Search API to Scavio for Better Coverage

Sonar API Alternative for Agents

Start Building

How to Replace Firecrawl for Large-Crawl Jobs

Prerequisites

Walkthrough

Step 1: Inventory your current crawl

Step 2: Queue the crawl in Scavio

Step 3: Poll for completion

Step 4: Save pages as markdown

Step 5: Schedule weekly refresh

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this replace firecrawl for large-crawl jobs tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Firecrawl vs Scavio

Best DuckDuckGo API Alternatives in 2026

Best Search API as a Brave Alternative in 2026

Scavio vs Firecrawl

Migrate from Brave Search API to Scavio for Better Coverage

Sonar API Alternative for Agents

Start Building