Workflow

Browser Automation Fallback Pipeline

Try structured API first, fall back to browser automation only if needed. Reduce Selenium and Playwright usage with API-first data access.

Overview

This pipeline implements an API-first approach to data collection: try Scavio structured API for every query first, and only fall back to browser automation (Selenium, Playwright) for the rare cases where structured data is insufficient. Most teams find that 90%+ of their scraping jobs can be replaced by API calls, dramatically reducing infrastructure costs and maintenance. The fallback path ensures complete coverage while minimizing browser automation usage.

Trigger

On-demand or scheduled per data collection batch

Schedule

On-demand per data collection batch

Workflow Steps

1

Load data collection tasks

Read the batch of URLs or queries that need data collection from the task queue.

2

Attempt API-first collection

For each task, try to fulfill it via Scavio structured API (Google, Amazon, etc.).

3

Evaluate API result completeness

Check if the API result contains all required fields. Mark tasks as complete or needing fallback.

4

Run browser fallback for remaining tasks

For tasks the API could not fully satisfy, fall back to browser automation as a last resort.

5

Log API vs browser ratio

Record how many tasks used API vs browser to track migration progress over time.

Python Implementation

Python
import requests
import json
from pathlib import Path
from datetime import datetime

API_KEY = "your_scavio_api_key"

TASKS = [
    {"id": "t1", "query": "best CRM software 2026", "platform": "google", "required_fields": ["title", "link", "snippet"]},
    {"id": "t2", "query": "wireless mouse", "platform": "amazon", "required_fields": ["title", "price", "rating"]},
    {"id": "t3", "query": "python tutorial 2026", "platform": "google", "required_fields": ["title", "link"]},
]

def try_api(query: str, platform: str, required: list[str]) -> dict | None:
    res = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": API_KEY},
        json={"platform": platform, "query": query},
        timeout=15,
    )
    res.raise_for_status()
    results = res.json().get("organic", [])
    if not results:
        return None

    top = results[0]
    # Check all required fields are present
    if all(top.get(f) for f in required):
        return {"method": "api", "data": {f: top[f] for f in required}}
    return None

def browser_fallback(query: str) -> dict:
    """Placeholder for Selenium/Playwright fallback."""
    return {"method": "browser", "data": {"note": "browser automation would run here"}}

def run():
    date = datetime.utcnow().strftime("%Y-%m-%d")
    results = []
    api_count = 0
    browser_count = 0

    for task in TASKS:
        api_result = try_api(task["query"], task["platform"], task["required_fields"])
        if api_result:
            results.append({"task_id": task["id"], **api_result})
            api_count += 1
        else:
            fallback = browser_fallback(task["query"])
            results.append({"task_id": task["id"], **fallback})
            browser_count += 1

    total = api_count + browser_count
    api_pct = round(api_count / total * 100) if total else 0

    output = {"date": date, "total_tasks": total, "api_fulfilled": api_count, "browser_fallback": browser_count, "api_percentage": api_pct, "results": results}
    Path(f"collection_batch_{date}.json").write_text(json.dumps(output, indent=2))

    print(f"Batch {date}: {api_count}/{total} via API ({api_pct}%), {browser_count} fallback to browser")

if __name__ == "__main__":
    run()

JavaScript Implementation

JavaScript
const API_KEY = "your_scavio_api_key";
const TASKS = [
  { id: "t1", query: "best CRM software 2026", platform: "google", required: ["title", "link"] },
  { id: "t2", query: "wireless mouse", platform: "amazon", required: ["title", "price"] },
];

async function tryApi(query, platform, required) {
  const res = await fetch("https://api.scavio.dev/api/v1/search", {
    method: "POST",
    headers: { "x-api-key": API_KEY, "content-type": "application/json" },
    body: JSON.stringify({ platform, query }),
  });
  if (!res.ok) return null;
  const top = ((await res.json()).organic ?? [])[0];
  if (!top || !required.every((f) => top[f])) return null;
  return { method: "api", data: Object.fromEntries(required.map((f) => [f, top[f]])) };
}

let apiCount = 0;
for (const task of TASKS) {
  const result = await tryApi(task.query, task.platform, task.required);
  if (result) apiCount++;
}
console.log(`${apiCount}/${TASKS.length} tasks fulfilled via API (${Math.round(apiCount / TASKS.length * 100)}%)`);

Platforms Used

Google

Web search with knowledge graph, PAA, and AI overviews

Frequently Asked Questions

This pipeline implements an API-first approach to data collection: try Scavio structured API for every query first, and only fall back to browser automation (Selenium, Playwright) for the rare cases where structured data is insufficient. Most teams find that 90%+ of their scraping jobs can be replaced by API calls, dramatically reducing infrastructure costs and maintenance. The fallback path ensures complete coverage while minimizing browser automation usage.

This workflow uses a on-demand or scheduled per data collection batch. On-demand per data collection batch.

This workflow uses the following Scavio platforms: google. Each platform is called via the same unified API endpoint.

Yes. Scavio's free tier includes 250 credits per month with no credit card required. That is enough to test and validate this workflow before scaling it.

Browser Automation Fallback Pipeline

Try structured API first, fall back to browser automation only if needed. Reduce Selenium and Playwright usage with API-first data access.