How is this workflow triggered?

This workflow uses a on-demand or scheduled per data collection batch. On-demand per data collection batch.

Which Scavio platforms does this workflow use?

This workflow uses the following Scavio platforms: google. Each platform is called via the same unified API endpoint.

Can I run this workflow on the free tier?

Yes. Scavio's free tier includes 250 credits per month with no credit card required. That is enough to test and validate this workflow before scaling it.

Browser Automation Fallback Pipeline

Overview

This pipeline implements an API-first approach to data collection: try Scavio structured API for every query first, and only fall back to browser automation (Selenium, Playwright) for the rare cases where structured data is insufficient. Most teams find that 90%+ of their scraping jobs can be replaced by API calls, dramatically reducing infrastructure costs and maintenance. The fallback path ensures complete coverage while minimizing browser automation usage.

Trigger

On-demand or scheduled per data collection batch

Schedule

On-demand per data collection batch

Workflow Steps

Load data collection tasks

Read the batch of URLs or queries that need data collection from the task queue.

Attempt API-first collection

For each task, try to fulfill it via Scavio structured API (Google, Amazon, etc.).

Evaluate API result completeness

Check if the API result contains all required fields. Mark tasks as complete or needing fallback.

Run browser fallback for remaining tasks

For tasks the API could not fully satisfy, fall back to browser automation as a last resort.

Log API vs browser ratio

Record how many tasks used API vs browser to track migration progress over time.

Python Implementation

Python

import requests
import json
from pathlib import Path
from datetime import datetime

API_KEY = "your_scavio_api_key"

TASKS = [
    {"id": "t1", "query": "best CRM software 2026", "platform": "google", "required_fields": ["title", "link", "snippet"]},
    {"id": "t2", "query": "wireless mouse", "platform": "amazon", "required_fields": ["title", "price", "rating"]},
    {"id": "t3", "query": "python tutorial 2026", "platform": "google", "required_fields": ["title", "link"]},
]

def try_api(query: str, platform: str, required: list[str]) -> dict | None:
    res = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": API_KEY},
        json={"platform": platform, "query": query},
        timeout=15,
    )
    res.raise_for_status()
    results = res.json().get("organic", [])
    if not results:
        return None

    top = results[0]
    # Check all required fields are present
    if all(top.get(f) for f in required):
        return {"method": "api", "data": {f: top[f] for f in required}}
    return None

def browser_fallback(query: str) -> dict:
    """Placeholder for Selenium/Playwright fallback."""
    return {"method": "browser", "data": {"note": "browser automation would run here"}}

def run():
    date = datetime.utcnow().strftime("%Y-%m-%d")
    results = []
    api_count = 0
    browser_count = 0

    for task in TASKS:
        api_result = try_api(task["query"], task["platform"], task["required_fields"])
        if api_result:
            results.append({"task_id": task["id"], **api_result})
            api_count += 1
        else:
            fallback = browser_fallback(task["query"])
            results.append({"task_id": task["id"], **fallback})
            browser_count += 1

    total = api_count + browser_count
    api_pct = round(api_count / total * 100) if total else 0

    output = {"date": date, "total_tasks": total, "api_fulfilled": api_count, "browser_fallback": browser_count, "api_percentage": api_pct, "results": results}
    Path(f"collection_batch_{date}.json").write_text(json.dumps(output, indent=2))

    print(f"Batch {date}: {api_count}/{total} via API ({api_pct}%), {browser_count} fallback to browser")

if __name__ == "__main__":
    run()

JavaScript Implementation

JavaScript

const API_KEY = "your_scavio_api_key";
const TASKS = [
  { id: "t1", query: "best CRM software 2026", platform: "google", required: ["title", "link"] },
  { id: "t2", query: "wireless mouse", platform: "amazon", required: ["title", "price"] },
];

async function tryApi(query, platform, required) {
  const res = await fetch("https://api.scavio.dev/api/v1/search", {
    method: "POST",
    headers: { "x-api-key": API_KEY, "content-type": "application/json" },
    body: JSON.stringify({ platform, query }),
  });
  if (!res.ok) return null;
  const top = ((await res.json()).organic ?? [])[0];
  if (!top || !required.every((f) => top[f])) return null;
  return { method: "api", data: Object.fromEntries(required.map((f) => [f, top[f]])) };
}

let apiCount = 0;
for (const task of TASKS) {
  const result = await tryApi(task.query, task.platform, task.required);
  if (result) apiCount++;
}
console.log(`${apiCount}/${TASKS.length} tasks fulfilled via API (${Math.round(apiCount / TASKS.length * 100)}%)`);

Platforms Used

Google

Web search with knowledge graph, PAA, and AI overviews

Browser Automation Fallback Pipeline

Overview

Trigger

Schedule

Workflow Steps

Load data collection tasks

Attempt API-first collection

Evaluate API result completeness

Run browser fallback for remaining tasks

Log API vs browser ratio

Python Implementation

JavaScript Implementation

Platforms Used

Google

Frequently Asked Questions

What does the Browser Automation Fallback Pipeline workflow do?

How is this workflow triggered?

Which Scavio platforms does this workflow use?

Can I run this workflow on the free tier?

Browser Automation Fallback Pipeline