Web scrapers built with requests and BeautifulSoup break every time a target site changes its HTML layout. Migrating to a structured API eliminates selector maintenance, CAPTCHA handling, and proxy management. This tutorial maps common scraping patterns to their API equivalents, showing the exact code replacement for Google, Amazon, and Reddit data extraction.
Prerequisites
- Python 3.8+
- requests library
- A Scavio API key from scavio.dev
- Existing scraping code to migrate
Walkthrough
Step 1: Map scraping patterns to API calls
Side-by-side comparison of scraping code vs API code for each pattern.
import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']
SH = {'x-api-key': API_KEY, 'Content-Type': 'application/json'}
# Pattern 1: Google search results
# BEFORE (scraper - 15+ lines, breaks often):
# from bs4 import BeautifulSoup
# def scrape_google(query):
# r = requests.get(f'https://www.google.com/search?q={query}',
# headers={'User-Agent': '...'})
# soup = BeautifulSoup(r.text, 'html.parser')
# results = []
# for div in soup.select('div.g'): # Selector changes regularly
# title = div.select_one('h3')
# link = div.select_one('a')
# snippet = div.select_one('.VwiC3b') # This selector breaks monthly
# if title and link:
# results.append({'title': title.text, 'link': link['href'], 'snippet': snippet.text if snippet else ''})
# return results
# AFTER (API - 3 lines, stable):
def search_google(query):
data = requests.post('https://api.scavio.dev/api/v1/search',
headers=SH, json={'query': query, 'country_code': 'us'}).json()
return data.get('organic_results', [])
results = search_google('python web framework 2026')
print(f'Google: {len(results)} results, structured JSON, no selectors')
for r in results[:2]: print(f' {r["position"]}. {r["title"][:50]}')Step 2: Migrate Amazon product scraping
Replace Amazon HTML parsing with structured product API calls.
# Pattern 2: Amazon product search
# BEFORE (scraper - 30+ lines, Selenium often needed):
# def scrape_amazon(query):
# # Needs Selenium for JS rendering + CAPTCHA handling
# driver = webdriver.Chrome()
# driver.get(f'https://www.amazon.com/s?k={query}')
# time.sleep(3) # Wait for JS
# if 'captcha' in driver.page_source.lower():
# # Handle CAPTCHA... somehow
# pass
# soup = BeautifulSoup(driver.page_source, 'html.parser')
# products = []
# for item in soup.select('[data-component-type="s-search-result"]'):
# title = item.select_one('h2 span')
# price_whole = item.select_one('.a-price-whole')
# price_frac = item.select_one('.a-price-fraction')
# # ... 20 more lines of fragile selectors
# driver.quit()
# return products
# AFTER (API - 3 lines):
def search_amazon(query):
data = requests.post('https://api.scavio.dev/api/v1/search',
headers=SH, json={'query': query, 'platform': 'amazon', 'country_code': 'us'}).json()
return data.get('organic_results', [])
products = search_amazon('wireless earbuds')
print(f'Amazon: {len(products)} products, no Selenium, no CAPTCHA')
for p in products[:2]: print(f' {p.get("title", "")[:40]} | {p.get("price", "N/A")}')Step 3: Migrate Reddit data extraction
Replace Reddit scraping with structured Reddit API search.
# Pattern 3: Reddit discussions
# BEFORE (scraper - requires auth + rate limiting):
# import praw # or direct scraping with JS rendering
# def scrape_reddit(query):
# # Option A: PRAW (needs Reddit app credentials)
# reddit = praw.Reddit(client_id='...', client_secret='...')
# results = reddit.subreddit('all').search(query, limit=10)
# # Option B: Direct scraping (needs Selenium for new Reddit)
# # driver.get(f'https://www.reddit.com/search/?q={query}')
# # ... many lines of JS-rendered HTML parsing
# AFTER (API - 3 lines, no auth needed):
def search_reddit(query):
data = requests.post('https://api.scavio.dev/api/v1/search',
headers=SH, json={'query': query, 'platform': 'reddit', 'country_code': 'us'}).json()
return data.get('organic_results', [])
posts = search_reddit('best python framework 2026')
print(f'Reddit: {len(posts)} discussions, no PRAW, no auth')
for p in posts[:2]: print(f' {p.get("title", "")[:60]}')
# Lines of code comparison:
print(f'\nCode reduction:')
print(f' Google: ~15 lines -> 3 lines')
print(f' Amazon: ~30 lines + Selenium -> 3 lines')
print(f' Reddit: ~20 lines + auth -> 3 lines')
print(f' Total: ~65 lines -> 9 lines')Step 4: Compare maintenance and cost
Calculate ongoing cost vs maintenance burden of each approach.
def migration_report(monthly_queries):
print(f'\n=== Scraper to API Migration Report ===')
print(f'Monthly queries: {monthly_queries:,}')
print(f'\n SCRAPER COSTS:')
print(f' Proxy service: $20-100/month')
print(f' CAPTCHA solver: $1-3/1K solves')
print(f' Server (Selenium): $20-50/month')
print(f' Maintenance: 4-8 hours/month @ $50/hr = $200-400')
print(f' Total estimate: $240-553/month')
api_cost = monthly_queries * 0.005
print(f'\n API COSTS:')
print(f' Scavio API: ${api_cost:.2f}/month ({monthly_queries:,} queries @ $0.005)')
print(f' Proxy: $0 (not needed)')
print(f' CAPTCHA: $0 (not needed)')
print(f' Selenium: $0 (not needed)')
print(f' Maintenance: ~0 hours/month (stable JSON)')
print(f' Total: ${api_cost:.2f}/month')
print(f'\n SAVINGS: ${240 - api_cost:.2f}-${553 - api_cost:.2f}/month')
print(f' RELIABILITY: 99%+ (vs 80-90% scraper success rate)')
print(f' CODE REDUCTION: ~65 lines -> ~9 lines per platform')
migration_report(5000)Python Example
import os, requests
SH = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}
# Replace ANY scraping code with 3 lines:
def search(query, platform=None):
body = {'query': query, 'country_code': 'us'}
if platform: body['platform'] = platform
return requests.post('https://api.scavio.dev/api/v1/search', headers=SH, json=body).json().get('organic_results', [])
# Before: 65+ lines of scraping code per platform
# After:
print(f'Google: {len(search("python tutorial"))} results')
print(f'Amazon: {len(search("laptop stand", "amazon"))} products')
print(f'Reddit: {len(search("best api", "reddit"))} discussions')
print(f'Cost: $0.015 total. Lines of code: 3.')JavaScript Example
const SH = { 'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json' };
async function search(query, platform) {
const body = { query, country_code: 'us' };
if (platform) body.platform = platform;
const data = await fetch('https://api.scavio.dev/api/v1/search', {
method: 'POST', headers: SH, body: JSON.stringify(body)
}).then(r => r.json());
return data.organic_results || [];
}
// Replace Puppeteer/Playwright with:
console.log(`Google: ${(await search('python tutorial')).length} results`);
console.log(`Amazon: ${(await search('laptop stand', 'amazon')).length} products`);
console.log('Cost: $0.010, Lines: 3');Expected Output
Google: 10 results, structured JSON, no selectors
1. FastAPI - Modern Python Web Framework
2. Django - Web Framework for Perfectionists
Amazon: 10 products, no Selenium, no CAPTCHA
Sony WF-1000XM5 Wireless Earbuds | $24.99
Reddit: 8 discussions, no PRAW, no auth
Code reduction:
Google: ~15 lines -> 3 lines
Amazon: ~30 lines + Selenium -> 3 lines
Reddit: ~20 lines + auth -> 3 lines
=== Scraper to API Migration Report ===
API COSTS: $25.00/month (5,000 queries)
SAVINGS: $215.00-$528.00/month