How long does this replace browser automation with structured api tutorial take?

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

What do I need before starting?

Python 3.8+. requests library. A Scavio API key from scavio.dev. Existing Playwright/Puppeteer code (optional). A Scavio API key gives you 50 free credits on signup.

Can I run this tutorial with the free tier?

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

What frameworks does this work with?

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Replace Playwright/Puppeteer with API (2026)

Playwright and Puppeteer are powerful but slow, expensive, and brittle for data extraction from known platforms. A structured API returns the same data in milliseconds without browser overhead, proxy costs, or CAPTCHA handling. This tutorial shows which use cases you can replace immediately and which still need browser automation, with honest tradeoffs.

Prerequisites

Python 3.8+
requests library
A Scavio API key from scavio.dev
Existing Playwright/Puppeteer code (optional)

Walkthrough

Step 1: Identify which browser automation to replace

Categorize your browser automation by what can move to API and what cannot.

Python

import os, requests

API_KEY = os.environ['SCAVIO_API_KEY']
SH = {'x-api-key': API_KEY, 'Content-Type': 'application/json'}

# CAN REPLACE with API:
replaceable = {
    'Google search scraping': 'Scavio search API (google platform)',
    'Amazon product scraping': 'Scavio search API (amazon platform)',
    'Reddit thread scraping': 'Scavio search API (reddit platform)',
    'YouTube search scraping': 'Scavio search API (youtube platform)',
    'Walmart product scraping': 'Scavio search API (walmart platform)',
    'TikTok profile scraping': 'Scavio TikTok API (profile endpoint)',
    'TikTok video data': 'Scavio TikTok API (user/posts endpoint)',
    'Google Maps data': 'Scavio search API (local_results field)',
}

# STILL NEED BROWSER:
need_browser = {
    'Custom web apps': 'No structured API for proprietary sites',
    'Login-required pages': 'API cannot authenticate to private accounts',
    'Interactive forms': 'Form submissions need browser context',
    'Screenshot capture': 'Visual rendering requires a browser',
    'Cookie-dependent flows': 'Session state needs browser persistence',
}

print('Replaceable with API:')
for task, api in replaceable.items():
    print(f'  {task:35} -> {api}')
print(f'\nStill needs browser ({len(need_browser)} cases):')
for task, reason in need_browser.items():
    print(f'  {task:35} | {reason}')

Step 2: Side-by-side code comparison

Compare Playwright browser code vs API calls for common tasks.

Python

# BEFORE: Playwright Google scraping (~20 lines, 3-5 seconds)
# from playwright.async_api import async_playwright
# async def scrape_google(query):
#     async with async_playwright() as p:
#         browser = await p.chromium.launch(headless=True)
#         page = await browser.new_page()
#         await page.goto(f'https://www.google.com/search?q={query}')
#         await page.wait_for_selector('div.g')
#         results = await page.query_selector_all('div.g')
#         data = []
#         for r in results[:10]:
#             title = await r.query_selector('h3')
#             link = await r.query_selector('a')
#             data.append({'title': await title.inner_text() if title else '',
#                          'link': await link.get_attribute('href') if link else ''})
#         await browser.close()
#         return data  # Takes 3-5 seconds, breaks on layout changes

# AFTER: API call (~3 lines, <1 second)
def search_google(query):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': query, 'country_code': 'us'}).json()
    return data.get('organic_results', [])

import time
start = time.time()
results = search_google('python web framework 2026')
elapsed = time.time() - start
print(f'API: {len(results)} results in {elapsed:.2f}s')
print(f'vs Playwright: ~3-5 seconds + browser memory + proxy cost')

Step 3: Migrate a real scraping pipeline

Step-by-step migration of a multi-page scraper to API calls.

Python

def migrate_pipeline():
    """Migrate a typical multi-page scraping pipeline to API."""
    # Step 1: Replace search scraping
    google_results = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': 'wireless earbuds', 'country_code': 'us'}).json()
    print(f'Google: {len(google_results.get("organic_results", []))} results')

    # Step 2: Replace Amazon scraping
    amazon_results = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': 'wireless earbuds', 'platform': 'amazon', 'country_code': 'us'}).json()
    print(f'Amazon: {len(amazon_results.get("organic_results", []))} products')

    # Step 3: Replace Reddit scraping
    reddit_results = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': 'wireless earbuds review', 'platform': 'reddit', 'country_code': 'us'}).json()
    print(f'Reddit: {len(reddit_results.get("organic_results", []))} discussions')

    # Step 4: Replace page content extraction
    if google_results.get('organic_results'):
        url = google_results['organic_results'][0].get('link', '')
        if url:
            extract = requests.post('https://api.scavio.dev/api/v1/extract',
                headers=SH, json={'url': url}).json()
            print(f'Extract: {len(str(extract.get("content", "")))} chars from {url[:40]}')

    print(f'\nTotal cost: $0.020 (4 API calls)')
    print(f'Total time: <2 seconds')
    print(f'Browser instances: 0')
    print(f'Proxy cost: $0')
    print(f'CAPTCHA blocks: 0')

migrate_pipeline()

Step 4: Compare cost and performance

Calculate total cost of ownership for browser vs API approaches.

Python

def tco_comparison(monthly_pages):
    print(f'\n=== Total Cost of Ownership ({monthly_pages:,} pages/month) ===')
    # Playwright/Puppeteer costs
    browser_server = 50  # Cloud server for browsers
    proxy = 30  # Proxy service
    captcha = monthly_pages * 0.05 * 0.002  # 5% CAPTCHA rate, $0.002/solve
    maintenance = 8 * 50  # 8 hours/month @ $50/hr fixing selectors
    browser_total = browser_server + proxy + captcha + maintenance
    print(f'\n  BROWSER AUTOMATION:')
    print(f'    Server (headless Chrome): ${browser_server}/mo')
    print(f'    Proxy service: ${proxy}/mo')
    print(f'    CAPTCHA solving (~5%): ${captcha:.2f}/mo')
    print(f'    Maintenance (selector fixes): ${maintenance}/mo')
    print(f'    Total: ${browser_total:.2f}/mo')
    # API costs
    api_cost = monthly_pages * 0.005
    print(f'\n  STRUCTURED API:')
    print(f'    Scavio API: ${api_cost:.2f}/mo ({monthly_pages:,} x $0.005)')
    print(f'    Server: $0 (runs anywhere)')
    print(f'    Proxy: $0 (not needed)')
    print(f'    CAPTCHA: $0 (not needed)')
    print(f'    Maintenance: ~$0 (stable JSON)')
    print(f'    Total: ${api_cost:.2f}/mo')
    savings = browser_total - api_cost
    print(f'\n  SAVINGS: ${savings:.2f}/mo ({savings/browser_total*100:.0f}%)')
    print(f'  SPEED: ~0.5s/request (API) vs ~3-5s/page (browser)')
    print(f'  RELIABILITY: 99%+ (API) vs 85-95% (browser)')

tco_comparison(5000)
tco_comparison(20000)

Python Example

Python

import os, requests, time
SH = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}

# Replace Playwright/Puppeteer with:
start = time.time()
for platform in [None, 'amazon', 'reddit']:
    body = {'query': 'wireless earbuds', 'country_code': 'us'}
    if platform: body['platform'] = platform
    data = requests.post('https://api.scavio.dev/api/v1/search', headers=SH, json=body).json()
    print(f'{platform or "google"}: {len(data.get("organic_results", []))} results')
print(f'Time: {time.time()-start:.2f}s | Cost: $0.015 | Browser: none')

JavaScript Example

JavaScript

const SH = { 'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json' };
// Replace Puppeteer with:
const start = Date.now();
for (const platform of [null, 'amazon', 'reddit']) {
  const body = { query: 'wireless earbuds', country_code: 'us' };
  if (platform) body.platform = platform;
  const data = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: SH, body: JSON.stringify(body)
  }).then(r => r.json());
  console.log(`${platform || 'google'}: ${(data.organic_results || []).length} results`);
}
console.log(`Time: ${(Date.now()-start)/1000}s | Cost: $0.015 | Browser: none`);

Expected Output

JSON

Replaceable with API:
  Google search scraping              -> Scavio search API (google platform)
  Amazon product scraping             -> Scavio search API (amazon platform)
  Reddit thread scraping              -> Scavio search API (reddit platform)

Still needs browser (5 cases):
  Custom web apps                     | No structured API for proprietary sites
  Login-required pages                | API cannot authenticate to private accounts

API: 10 results in 0.45s
vs Playwright: ~3-5 seconds + browser memory + proxy cost

=== Total Cost of Ownership (5,000 pages/month) ===
  BROWSER AUTOMATION: $480.50/mo
  STRUCTURED API: $25.00/mo
  SAVINGS: $455.50/mo (95%)

Prerequisites

Python 3.8+
requests library
A Scavio API key from scavio.dev
Existing Playwright/Puppeteer code (optional)

Walkthrough

Step 1: Identify which browser automation to replace

Categorize your browser automation by what can move to API and what cannot.

Python

import os, requests

API_KEY = os.environ['SCAVIO_API_KEY']
SH = {'x-api-key': API_KEY, 'Content-Type': 'application/json'}

# CAN REPLACE with API:
replaceable = {
    'Google search scraping': 'Scavio search API (google platform)',
    'Amazon product scraping': 'Scavio search API (amazon platform)',
    'Reddit thread scraping': 'Scavio search API (reddit platform)',
    'YouTube search scraping': 'Scavio search API (youtube platform)',
    'Walmart product scraping': 'Scavio search API (walmart platform)',
    'TikTok profile scraping': 'Scavio TikTok API (profile endpoint)',
    'TikTok video data': 'Scavio TikTok API (user/posts endpoint)',
    'Google Maps data': 'Scavio search API (local_results field)',
}

# STILL NEED BROWSER:
need_browser = {
    'Custom web apps': 'No structured API for proprietary sites',
    'Login-required pages': 'API cannot authenticate to private accounts',
    'Interactive forms': 'Form submissions need browser context',
    'Screenshot capture': 'Visual rendering requires a browser',
    'Cookie-dependent flows': 'Session state needs browser persistence',
}

print('Replaceable with API:')
for task, api in replaceable.items():
    print(f'  {task:35} -> {api}')
print(f'\nStill needs browser ({len(need_browser)} cases):')
for task, reason in need_browser.items():
    print(f'  {task:35} | {reason}')

Step 2: Side-by-side code comparison

Compare Playwright browser code vs API calls for common tasks.

Python

# BEFORE: Playwright Google scraping (~20 lines, 3-5 seconds)
# from playwright.async_api import async_playwright
# async def scrape_google(query):
#     async with async_playwright() as p:
#         browser = await p.chromium.launch(headless=True)
#         page = await browser.new_page()
#         await page.goto(f'https://www.google.com/search?q={query}')
#         await page.wait_for_selector('div.g')
#         results = await page.query_selector_all('div.g')
#         data = []
#         for r in results[:10]:
#             title = await r.query_selector('h3')
#             link = await r.query_selector('a')
#             data.append({'title': await title.inner_text() if title else '',
#                          'link': await link.get_attribute('href') if link else ''})
#         await browser.close()
#         return data  # Takes 3-5 seconds, breaks on layout changes

# AFTER: API call (~3 lines, <1 second)
def search_google(query):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': query, 'country_code': 'us'}).json()
    return data.get('organic_results', [])

import time
start = time.time()
results = search_google('python web framework 2026')
elapsed = time.time() - start
print(f'API: {len(results)} results in {elapsed:.2f}s')
print(f'vs Playwright: ~3-5 seconds + browser memory + proxy cost')

Step 3: Migrate a real scraping pipeline

Step-by-step migration of a multi-page scraper to API calls.

Python

def migrate_pipeline():
    """Migrate a typical multi-page scraping pipeline to API."""
    # Step 1: Replace search scraping
    google_results = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': 'wireless earbuds', 'country_code': 'us'}).json()
    print(f'Google: {len(google_results.get("organic_results", []))} results')

    # Step 2: Replace Amazon scraping
    amazon_results = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': 'wireless earbuds', 'platform': 'amazon', 'country_code': 'us'}).json()
    print(f'Amazon: {len(amazon_results.get("organic_results", []))} products')

    # Step 3: Replace Reddit scraping
    reddit_results = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': 'wireless earbuds review', 'platform': 'reddit', 'country_code': 'us'}).json()
    print(f'Reddit: {len(reddit_results.get("organic_results", []))} discussions')

    # Step 4: Replace page content extraction
    if google_results.get('organic_results'):
        url = google_results['organic_results'][0].get('link', '')
        if url:
            extract = requests.post('https://api.scavio.dev/api/v1/extract',
                headers=SH, json={'url': url}).json()
            print(f'Extract: {len(str(extract.get("content", "")))} chars from {url[:40]}')

    print(f'\nTotal cost: $0.020 (4 API calls)')
    print(f'Total time: <2 seconds')
    print(f'Browser instances: 0')
    print(f'Proxy cost: $0')
    print(f'CAPTCHA blocks: 0')

migrate_pipeline()

Step 4: Compare cost and performance

Calculate total cost of ownership for browser vs API approaches.

Python

def tco_comparison(monthly_pages):
    print(f'\n=== Total Cost of Ownership ({monthly_pages:,} pages/month) ===')
    # Playwright/Puppeteer costs
    browser_server = 50  # Cloud server for browsers
    proxy = 30  # Proxy service
    captcha = monthly_pages * 0.05 * 0.002  # 5% CAPTCHA rate, $0.002/solve
    maintenance = 8 * 50  # 8 hours/month @ $50/hr fixing selectors
    browser_total = browser_server + proxy + captcha + maintenance
    print(f'\n  BROWSER AUTOMATION:')
    print(f'    Server (headless Chrome): ${browser_server}/mo')
    print(f'    Proxy service: ${proxy}/mo')
    print(f'    CAPTCHA solving (~5%): ${captcha:.2f}/mo')
    print(f'    Maintenance (selector fixes): ${maintenance}/mo')
    print(f'    Total: ${browser_total:.2f}/mo')
    # API costs
    api_cost = monthly_pages * 0.005
    print(f'\n  STRUCTURED API:')
    print(f'    Scavio API: ${api_cost:.2f}/mo ({monthly_pages:,} x $0.005)')
    print(f'    Server: $0 (runs anywhere)')
    print(f'    Proxy: $0 (not needed)')
    print(f'    CAPTCHA: $0 (not needed)')
    print(f'    Maintenance: ~$0 (stable JSON)')
    print(f'    Total: ${api_cost:.2f}/mo')
    savings = browser_total - api_cost
    print(f'\n  SAVINGS: ${savings:.2f}/mo ({savings/browser_total*100:.0f}%)')
    print(f'  SPEED: ~0.5s/request (API) vs ~3-5s/page (browser)')
    print(f'  RELIABILITY: 99%+ (API) vs 85-95% (browser)')

tco_comparison(5000)
tco_comparison(20000)

Python Example

Python

import os, requests, time
SH = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}

# Replace Playwright/Puppeteer with:
start = time.time()
for platform in [None, 'amazon', 'reddit']:
    body = {'query': 'wireless earbuds', 'country_code': 'us'}
    if platform: body['platform'] = platform
    data = requests.post('https://api.scavio.dev/api/v1/search', headers=SH, json=body).json()
    print(f'{platform or "google"}: {len(data.get("organic_results", []))} results')
print(f'Time: {time.time()-start:.2f}s | Cost: $0.015 | Browser: none')

JavaScript Example

JavaScript

const SH = { 'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json' };
// Replace Puppeteer with:
const start = Date.now();
for (const platform of [null, 'amazon', 'reddit']) {
  const body = { query: 'wireless earbuds', country_code: 'us' };
  if (platform) body.platform = platform;
  const data = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: SH, body: JSON.stringify(body)
  }).then(r => r.json());
  console.log(`${platform || 'google'}: ${(data.organic_results || []).length} results`);
}
console.log(`Time: ${(Date.now()-start)/1000}s | Cost: $0.015 | Browser: none`);

Expected Output

JSON

Replaceable with API:
  Google search scraping              -> Scavio search API (google platform)
  Amazon product scraping             -> Scavio search API (amazon platform)
  Reddit thread scraping              -> Scavio search API (reddit platform)

Still needs browser (5 cases):
  Custom web apps                     | No structured API for proprietary sites
  Login-required pages                | API cannot authenticate to private accounts

API: 10 results in 0.45s
vs Playwright: ~3-5 seconds + browser memory + proxy cost

=== Total Cost of Ownership (5,000 pages/month) ===
  BROWSER AUTOMATION: $480.50/mo
  STRUCTURED API: $25.00/mo
  SAVINGS: $455.50/mo (95%)

How to Replace Browser Automation with Structured API

Prerequisites

Walkthrough

Step 1: Identify which browser automation to replace

Step 2: Side-by-side code comparison

Step 3: Migrate a real scraping pipeline

Step 4: Compare cost and performance

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this replace browser automation with structured api tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Browser Automation API Replacement

Best Web Scraping Alternatives Under $50/Month in 2026

Best Browser Automation Alternatives for Web Data (2026)

Search APIs (Scavio, Tavily, SerpAPI) vs Headless Browser (Playwright, Puppeteer, Browserbase)

n8n Scraping to API Migration

Browser Automation vs API

Start Building

How to Replace Browser Automation with Structured API

Prerequisites

Walkthrough

Step 1: Identify which browser automation to replace

Step 2: Side-by-side code comparison

Step 3: Migrate a real scraping pipeline

Step 4: Compare cost and performance

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this replace browser automation with structured api tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Browser Automation API Replacement

Best Web Scraping Alternatives Under $50/Month in 2026

Best Browser Automation Alternatives for Web Data (2026)

Search APIs (Scavio, Tavily, SerpAPI) vs Headless Browser (Playwright, Puppeteer, Browserbase)

n8n Scraping to API Migration

Browser Automation vs API

Start Building