Web scrapers that parse Google, Reddit, or Amazon HTML are the most brittle part of any data pipeline. When the target site changes its layout, your scraper breaks. When they detect your traffic, you get blocked. When you scale up, proxy costs spike. A structured search API returns the same data as clean JSON, with no parsing, no proxies, and no maintenance. This tutorial shows how to replace a typical scraper with Scavio's API, step by step.
Prerequisites
- Python 3.8+ installed
- An existing scraper you want to migrate (BeautifulSoup, Playwright, or Selenium)
- A Scavio API key from scavio.dev
Walkthrough
Step 1: Audit your scraper's data output
Identify what fields your scraper currently extracts. Most Google scrapers extract: title, URL, snippet, position.
# Typical scraper output:
# [
# {'title': '...', 'url': '...', 'snippet': '...', 'position': 1},
# {'title': '...', 'url': '...', 'snippet': '...', 'position': 2},
# ]
#
# Scavio's 'organic' array returns the same fields:
# [
# {'title': '...', 'link': '...', 'snippet': '...', 'position': 1},
# ]
# Only difference: 'url' -> 'link'Step 2: Replace the scraping function
Replace your scraping code with a single API call.
import requests, os
# BEFORE: 150 lines of scraping code
# from bs4 import BeautifulSoup
# import random
# PROXIES = [...]
# def scrape_google(query):
# proxy = random.choice(PROXIES)
# resp = requests.get(f'https://www.google.com/search?q={query}',
# proxies={'https': proxy}, headers={'User-Agent': ...})
# soup = BeautifulSoup(resp.text, 'html.parser')
# results = []
# for div in soup.select('div.g'):
# ... # 100 lines of parsing
# AFTER: 10 lines
def search_google(query: str) -> list:
resp = requests.post('https://api.scavio.dev/api/v1/search',
headers={'x-api-key': os.environ['SCAVIO_API_KEY']},
json={'platform': 'google', 'query': query}, timeout=10)
return [{'title': r['title'], 'url': r['link'], 'snippet': r['snippet'], 'position': r.get('position', i+1)}
for i, r in enumerate(resp.json().get('organic', []))]Step 3: Update field references downstream
If your code references scraper-specific field names, update them.
# Find all references to the old scraper output format:
# grep -r 'scrape_google\|from scraper\|import scraper' .
# Common field mapping:
# Old scraper -> Scavio API
# result.url -> result.link
# result.desc -> result.snippet
# result.rank -> result.positionStep 4: Remove proxy and parser dependencies
Clean up your requirements file and remove scraping infrastructure.
# Remove from requirements.txt:
# beautifulsoup4
# lxml
# playwright
# selenium
# webdriver-manager
# fake-useragent
# rotating-proxies
# Remove proxy configuration files
# Cancel proxy subscription (saves $50-200/month)
# Your requirements.txt now just needs:
# requestsPython Example
# Migration summary:
# Before: 150 lines + proxy subscription + maintenance
# After: 10 lines + $0.003/query + zero maintenance
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
def search(query, platform='google'):
return requests.post('https://api.scavio.dev/api/v1/search',
headers=H, json={'platform': platform, 'query': query},
timeout=10).json().get('organic', [])JavaScript Example
// Before: Playwright + proxy rotation + HTML parsing
// After:
async function search(query, platform = 'google') {
const resp = await fetch('https://api.scavio.dev/api/v1/search', {
method: 'POST', headers: {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'},
body: JSON.stringify({platform, query})
});
return (await resp.json()).organic || [];
}Expected Output
A clean search function replacing hundreds of lines of scraping code. No proxies, no parsing, no maintenance.