Tutorial

How to Migrate from Web Scraper to Structured API

Step-by-step migration from requests+BeautifulSoup scraping to Scavio structured API calls. Code mapping and cost comparison.

Web scrapers built with requests and BeautifulSoup break every time a target site changes its HTML layout. Migrating to a structured API eliminates selector maintenance, CAPTCHA handling, and proxy management. This tutorial maps common scraping patterns to their API equivalents, showing the exact code replacement for Google, Amazon, and Reddit data extraction.

Prerequisites

  • Python 3.8+
  • requests library
  • A Scavio API key from scavio.dev
  • Existing scraping code to migrate

Walkthrough

Step 1: Map scraping patterns to API calls

Side-by-side comparison of scraping code vs API code for each pattern.

Python
import os, requests

API_KEY = os.environ['SCAVIO_API_KEY']
SH = {'x-api-key': API_KEY, 'Content-Type': 'application/json'}

# Pattern 1: Google search results
# BEFORE (scraper - 15+ lines, breaks often):
# from bs4 import BeautifulSoup
# def scrape_google(query):
#     r = requests.get(f'https://www.google.com/search?q={query}',
#         headers={'User-Agent': '...'})
#     soup = BeautifulSoup(r.text, 'html.parser')
#     results = []
#     for div in soup.select('div.g'):  # Selector changes regularly
#         title = div.select_one('h3')
#         link = div.select_one('a')
#         snippet = div.select_one('.VwiC3b')  # This selector breaks monthly
#         if title and link:
#             results.append({'title': title.text, 'link': link['href'], 'snippet': snippet.text if snippet else ''})
#     return results

# AFTER (API - 3 lines, stable):
def search_google(query):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': query, 'country_code': 'us'}).json()
    return data.get('organic_results', [])

results = search_google('python web framework 2026')
print(f'Google: {len(results)} results, structured JSON, no selectors')
for r in results[:2]: print(f'  {r["position"]}. {r["title"][:50]}')

Step 2: Migrate Amazon product scraping

Replace Amazon HTML parsing with structured product API calls.

Python
# Pattern 2: Amazon product search
# BEFORE (scraper - 30+ lines, Selenium often needed):
# def scrape_amazon(query):
#     # Needs Selenium for JS rendering + CAPTCHA handling
#     driver = webdriver.Chrome()
#     driver.get(f'https://www.amazon.com/s?k={query}')
#     time.sleep(3)  # Wait for JS
#     if 'captcha' in driver.page_source.lower():
#         # Handle CAPTCHA... somehow
#         pass
#     soup = BeautifulSoup(driver.page_source, 'html.parser')
#     products = []
#     for item in soup.select('[data-component-type="s-search-result"]'):
#         title = item.select_one('h2 span')
#         price_whole = item.select_one('.a-price-whole')
#         price_frac = item.select_one('.a-price-fraction')
#         # ... 20 more lines of fragile selectors
#     driver.quit()
#     return products

# AFTER (API - 3 lines):
def search_amazon(query):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': query, 'platform': 'amazon', 'country_code': 'us'}).json()
    return data.get('organic_results', [])

products = search_amazon('wireless earbuds')
print(f'Amazon: {len(products)} products, no Selenium, no CAPTCHA')
for p in products[:2]: print(f'  {p.get("title", "")[:40]} | {p.get("price", "N/A")}')

Step 3: Migrate Reddit data extraction

Replace Reddit scraping with structured Reddit API search.

Python
# Pattern 3: Reddit discussions
# BEFORE (scraper - requires auth + rate limiting):
# import praw  # or direct scraping with JS rendering
# def scrape_reddit(query):
#     # Option A: PRAW (needs Reddit app credentials)
#     reddit = praw.Reddit(client_id='...', client_secret='...')
#     results = reddit.subreddit('all').search(query, limit=10)
#     # Option B: Direct scraping (needs Selenium for new Reddit)
#     # driver.get(f'https://www.reddit.com/search/?q={query}')
#     # ... many lines of JS-rendered HTML parsing

# AFTER (API - 3 lines, no auth needed):
def search_reddit(query):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': query, 'platform': 'reddit', 'country_code': 'us'}).json()
    return data.get('organic_results', [])

posts = search_reddit('best python framework 2026')
print(f'Reddit: {len(posts)} discussions, no PRAW, no auth')
for p in posts[:2]: print(f'  {p.get("title", "")[:60]}')

# Lines of code comparison:
print(f'\nCode reduction:')
print(f'  Google: ~15 lines -> 3 lines')
print(f'  Amazon: ~30 lines + Selenium -> 3 lines')
print(f'  Reddit: ~20 lines + auth -> 3 lines')
print(f'  Total: ~65 lines -> 9 lines')

Step 4: Compare maintenance and cost

Calculate ongoing cost vs maintenance burden of each approach.

Python
def migration_report(monthly_queries):
    print(f'\n=== Scraper to API Migration Report ===')
    print(f'Monthly queries: {monthly_queries:,}')
    print(f'\n  SCRAPER COSTS:')
    print(f'    Proxy service: $20-100/month')
    print(f'    CAPTCHA solver: $1-3/1K solves')
    print(f'    Server (Selenium): $20-50/month')
    print(f'    Maintenance: 4-8 hours/month @ $50/hr = $200-400')
    print(f'    Total estimate: $240-553/month')
    api_cost = monthly_queries * 0.005
    print(f'\n  API COSTS:')
    print(f'    Scavio API: ${api_cost:.2f}/month ({monthly_queries:,} queries @ $0.005)')
    print(f'    Proxy: $0 (not needed)')
    print(f'    CAPTCHA: $0 (not needed)')
    print(f'    Selenium: $0 (not needed)')
    print(f'    Maintenance: ~0 hours/month (stable JSON)')
    print(f'    Total: ${api_cost:.2f}/month')
    print(f'\n  SAVINGS: ${240 - api_cost:.2f}-${553 - api_cost:.2f}/month')
    print(f'  RELIABILITY: 99%+ (vs 80-90% scraper success rate)')
    print(f'  CODE REDUCTION: ~65 lines -> ~9 lines per platform')

migration_report(5000)

Python Example

Python
import os, requests
SH = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}

# Replace ANY scraping code with 3 lines:
def search(query, platform=None):
    body = {'query': query, 'country_code': 'us'}
    if platform: body['platform'] = platform
    return requests.post('https://api.scavio.dev/api/v1/search', headers=SH, json=body).json().get('organic_results', [])

# Before: 65+ lines of scraping code per platform
# After:
print(f'Google: {len(search("python tutorial"))} results')
print(f'Amazon: {len(search("laptop stand", "amazon"))} products')
print(f'Reddit: {len(search("best api", "reddit"))} discussions')
print(f'Cost: $0.015 total. Lines of code: 3.')

JavaScript Example

JavaScript
const SH = { 'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json' };
async function search(query, platform) {
  const body = { query, country_code: 'us' };
  if (platform) body.platform = platform;
  const data = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: SH, body: JSON.stringify(body)
  }).then(r => r.json());
  return data.organic_results || [];
}
// Replace Puppeteer/Playwright with:
console.log(`Google: ${(await search('python tutorial')).length} results`);
console.log(`Amazon: ${(await search('laptop stand', 'amazon')).length} products`);
console.log('Cost: $0.010, Lines: 3');

Expected Output

JSON
Google: 10 results, structured JSON, no selectors
  1. FastAPI - Modern Python Web Framework
  2. Django - Web Framework for Perfectionists

Amazon: 10 products, no Selenium, no CAPTCHA
  Sony WF-1000XM5 Wireless Earbuds | $24.99

Reddit: 8 discussions, no PRAW, no auth

Code reduction:
  Google: ~15 lines -> 3 lines
  Amazon: ~30 lines + Selenium -> 3 lines
  Reddit: ~20 lines + auth -> 3 lines

=== Scraper to API Migration Report ===
  API COSTS: $25.00/month (5,000 queries)
  SAVINGS: $215.00-$528.00/month

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.8+. requests library. A Scavio API key from scavio.dev. Existing scraping code to migrate. A Scavio API key gives you 250 free credits per month.

Yes. The free tier includes 250 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Step-by-step migration from requests+BeautifulSoup scraping to Scavio structured API calls. Code mapping and cost comparison.