Workflow

CAPTCHA-Free Data Refresh Daily

Daily product price and ranking refresh without CAPTCHA issues. Replace brittle scrapers with structured API calls that never get blocked.

Overview

This workflow refreshes product prices and search rankings daily using Scavio structured API calls instead of browser-based scrapers that get blocked by CAPTCHAs. It processes a list of products across Google and Amazon, updates the local price database, and never encounters CAPTCHA challenges because Scavio handles data access at the infrastructure level. Teams that migrated from Selenium or Playwright scrapers use this to eliminate the maintenance burden of CAPTCHA-solving services.

Trigger

Cron schedule (daily at 6:00 AM UTC)

Schedule

Runs daily at 6:00 AM UTC

Workflow Steps

1

Load product refresh list

Read the list of products and their associated search queries from the local database.

2

Query Google for rankings

Search Scavio Google for each product to capture current organic positions and featured snippets.

3

Query Amazon for prices

Search Scavio Amazon for each product to get current pricing, ratings, and availability.

4

Update local database

Write the refreshed data to the local database with timestamps for freshness tracking.

5

Log refresh results

Output a summary of products refreshed, prices changed, and any queries that returned no results.

Python Implementation

Python
import requests
import json
from pathlib import Path
from datetime import datetime

API_KEY = "your_scavio_api_key"

PRODUCTS = [
    {"slug": "wireless-earbuds", "google_query": "best wireless earbuds 2026", "amazon_query": "wireless earbuds"},
    {"slug": "standing-desk", "google_query": "best standing desk 2026", "amazon_query": "standing desk adjustable"},
    {"slug": "mechanical-keyboard", "google_query": "best mechanical keyboard 2026", "amazon_query": "mechanical keyboard"},
]

def search(query: str, platform: str) -> dict:
    res = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": API_KEY},
        json={"platform": platform, "query": query},
        timeout=15,
    )
    res.raise_for_status()
    return res.json()

def run():
    date = datetime.utcnow().strftime("%Y-%m-%d")
    refreshed = []

    for product in PRODUCTS:
        google_data = search(product["google_query"], "google")
        amazon_data = search(product["amazon_query"], "amazon")

        google_top = google_data.get("organic", [])[:5]
        amazon_top = amazon_data.get("organic", [])[:5]
        amazon_prices = [r.get("price") for r in amazon_top if r.get("price")]

        refreshed.append({
            "slug": product["slug"],
            "date": date,
            "google_results": len(google_top),
            "google_top_title": google_top[0].get("title", "") if google_top else "",
            "amazon_results": len(amazon_top),
            "amazon_lowest_price": min(amazon_prices) if amazon_prices else None,
            "amazon_highest_price": max(amazon_prices) if amazon_prices else None,
        })

    output = {"date": date, "products_refreshed": len(PRODUCTS), "data": refreshed}
    Path(f"data_refresh_{date}.json").write_text(json.dumps(output, indent=2))

    print(f"Data refresh {date}: {len(PRODUCTS)} products updated (zero CAPTCHAs)")
    for r in refreshed:
        price_str = "$" + f"{r['amazon_lowest_price']:.2f}" if r["amazon_lowest_price"] else "no price"
        print(f"  {r['slug']}: {r['google_results']} Google, {r['amazon_results']} Amazon, {price_str}")

if __name__ == "__main__":
    run()

JavaScript Implementation

JavaScript
const API_KEY = "your_scavio_api_key";
const PRODUCTS = [
  { slug: "wireless-earbuds", google: "best wireless earbuds 2026", amazon: "wireless earbuds" },
  { slug: "standing-desk", google: "best standing desk 2026", amazon: "standing desk adjustable" },
];

async function search(query, platform) {
  const res = await fetch("https://api.scavio.dev/api/v1/search", {
    method: "POST",
    headers: { "x-api-key": API_KEY, "content-type": "application/json" },
    body: JSON.stringify({ platform, query }),
  });
  if (!res.ok) throw new Error(`scavio ${res.status}`);
  return res.json();
}

for (const product of PRODUCTS) {
  const google = await search(product.google, "google");
  const amazon = await search(product.amazon, "amazon");
  const prices = (amazon.organic ?? []).map((r) => r.price).filter(Boolean);
  const lowest = prices.length ? Math.min(...prices) : null;
  console.log(`${product.slug}: ${(google.organic ?? []).length} Google, ${(amazon.organic ?? []).length} Amazon, ${lowest ? "$" + lowest : "no price"}`);
}

Platforms Used

Google

Web search with knowledge graph, PAA, and AI overviews

Amazon

Product search with prices, ratings, and reviews

Frequently Asked Questions

This workflow refreshes product prices and search rankings daily using Scavio structured API calls instead of browser-based scrapers that get blocked by CAPTCHAs. It processes a list of products across Google and Amazon, updates the local price database, and never encounters CAPTCHA challenges because Scavio handles data access at the infrastructure level. Teams that migrated from Selenium or Playwright scrapers use this to eliminate the maintenance burden of CAPTCHA-solving services.

This workflow uses a cron schedule (daily at 6:00 am utc). Runs daily at 6:00 AM UTC.

This workflow uses the following Scavio platforms: google, amazon. Each platform is called via the same unified API endpoint.

Yes. Scavio's free tier includes 250 credits per month with no credit card required. That is enough to test and validate this workflow before scaling it.

CAPTCHA-Free Data Refresh Daily

Daily product price and ranking refresh without CAPTCHA issues. Replace brittle scrapers with structured API calls that never get blocked.