migrationfirecrawltutorial

Moving from Firecrawl to Structured JSON Search Results

When to switch from URL crawling with Firecrawl to structured search results -- and how to make the migration.

7 min read

Firecrawl is designed for crawling and scraping individual URLs -- you give it a web page and it returns the content as markdown or structured data. This works well for extracting data from known URLs, but falls short when you need search results across platforms. If your use case is discovering content rather than extracting from specific pages, a structured search API is a better fit.

Crawling vs. Searching

The fundamental difference is the starting point. Firecrawl starts with a URL you already have. A search API starts with a query and returns URLs you don't have yet. These are complementary but distinct capabilities:

  • Firecrawl -- "Extract the content from this specific page"
  • Search API -- "Find pages about this topic and return structured results"

Many teams start with Firecrawl because they need web data, then realize they are manually finding URLs to feed into the crawler. A search API eliminates that manual step entirely.

What Structured Search JSON Looks Like

Instead of crawling a page and parsing its HTML into markdown, a search API returns pre-structured data for every result:

JSON
{
  "data": {
    "organic": [
      {
        "title": "How to Deploy a Node.js App",
        "link": "https://example.com/deploy-nodejs",
        "snippet": "Step-by-step guide to deploying Node.js applications...",
        "position": 1
      }
    ],
    "peopleAlsoAsk": [
      {
        "question": "What is the best way to deploy Node.js?",
        "answer": "The most common approaches include..."
      }
    ]
  }
}

Every field is typed and consistently named. There is no need to write extraction rules or handle varying page layouts -- the API has already done that work.

Replacing Firecrawl Workflows

A typical Firecrawl workflow for research looks like this: search Google manually, copy URLs, feed each URL into Firecrawl, parse the returned markdown, and aggregate results. With a search API, you skip the first three steps:

Python
# Firecrawl workflow (multiple steps)
# 1. Manually find URLs or use another API
# 2. Crawl each URL
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key="...")
page = app.scrape_url("https://example.com/article")
content = page["markdown"]

# Search API workflow (one step)
import requests
response = requests.post(
    "https://api.scavio.dev/api/v1/search",
    headers={"x-api-key": "YOUR_API_KEY"},
    json={"platform": "google", "query": "deploy nodejs production"}
)
results = response.json()["data"]["organic"]

The search API returns multiple results with snippets in a single call. For many use cases -- research agents, content aggregation, competitive analysis -- the snippets contain enough information without needing to crawl the full page.

Multi-Platform Advantage

Firecrawl works with any URL but returns raw page content. It does not understand product schemas, video metadata, or review structures. A search API returns platform-native data:

  • Amazon -- product title, price, rating, review count, ASIN
  • YouTube -- video title, views, duration, channel, transcript
  • Walmart -- product name, price, availability, delivery options
  • Reddit -- post title, score, comment count, subreddit

Crawling an Amazon product page with Firecrawl gives you markdown. Searching Amazon with an API gives you a JSON object with typedprice, rating, and reviewCountfields -- ready for your application without any parsing.

When You Still Need a Crawler

Search APIs do not replace crawlers for every use case. If you need the full text content of a specific web page -- for RAG ingestion, document summarization, or archiving -- a crawler like Firecrawl is the right tool. The best architecture often combines both:

  • Use a search API to discover relevant pages
  • Use the returned links to decide which pages deserve full crawling
  • Crawl only the high-value pages for deep content extraction

This hybrid approach reduces crawling costs by 80-90% compared to crawling everything, while still getting full content when you need it.

Getting Started

Replace Firecrawl calls that are really doing search-and-extract with direct search API calls. Keep Firecrawl for targeted page extraction where you genuinely need full page content. The result is a faster, cheaper pipeline that returns structured data by default.