Building a direct web scraper requires managing a rotating proxy pool, solving CAPTCHAs, handling JavaScript rendering, and parsing raw HTML — all of which require significant engineering effort and ongoing maintenance. The Scavio API is a managed search data service that handles all of this infrastructure server-side. You make a single authenticated HTTP POST and receive structured JSON. This tutorial compares the traditional proxy-based approach to the Scavio approach and shows how to migrate from a scraper to the API.
Prerequisites
- Python 3.8 or higher
- requests library installed
- A Scavio API key
- Basic understanding of HTTP requests
Walkthrough
Step 1: The traditional scraping approach (before)
A typical scraper requires proxy configuration, user-agent rotation, and HTML parsing — all prone to breaking when sites change.
# Traditional approach — fragile and requires proxy infrastructure
import requests
from bs4 import BeautifulSoup
proxies = {"http": "http://user:pass@proxy:8080", "https": "http://user:pass@proxy:8080"}
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)..."}
response = requests.get("https://www.google.com/search?q=python+tutorial",
proxies=proxies, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
# Fragile: class names change without notice
results = soup.find_all("div", class_="tF2Cxc")Step 2: The Scavio approach (after)
Replace the scraper with a single API call. No proxies, no HTML parsing, no maintenance.
# Scavio approach — stable, structured, no infrastructure
import requests
response = requests.post(
"https://api.scavio.dev/api/v1/search",
headers={"x-api-key": "your_scavio_api_key"},
json={"query": "python tutorial", "country_code": "us"}
)
results = response.json()["organic_results"]Step 3: Handle retries with exponential backoff
The Scavio API is reliable, but add a simple retry wrapper for network errors.
import time
def search_with_retry(query: str, max_retries: int = 3) -> dict:
for attempt in range(max_retries):
try:
r = requests.post(ENDPOINT, headers={"x-api-key": API_KEY},
json={"query": query, "country_code": "us"}, timeout=30)
r.raise_for_status()
return r.json()
except requests.RequestException as e:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt)
return {}Step 4: Validate the response schema
Add a simple schema check to ensure the response contains expected fields before processing.
def validate_response(data: dict) -> bool:
required = ["organic_results"]
return all(k in data for k in required)
data = search_with_retry("python tutorial")
if validate_response(data):
for r in data["organic_results"][:5]:
print(r["title"], r["link"])Python Example
import os
import time
import requests
API_KEY = os.environ.get("SCAVIO_API_KEY", "your_scavio_api_key")
ENDPOINT = "https://api.scavio.dev/api/v1/search"
def search(query: str, retries: int = 3) -> dict:
for i in range(retries):
try:
r = requests.post(ENDPOINT, headers={"x-api-key": API_KEY},
json={"query": query, "country_code": "us"}, timeout=30)
r.raise_for_status()
return r.json()
except requests.RequestException:
if i < retries - 1:
time.sleep(2 ** i)
else:
raise
return {}
if __name__ == "__main__":
# No proxies, no HTML parsing, no CAPTCHA solving
data = search("python tutorial")
for r in data.get("organic_results", [])[:5]:
print(f"{r['title']}\n{r['link']}\n")JavaScript Example
const API_KEY = process.env.SCAVIO_API_KEY || "your_scavio_api_key";
const ENDPOINT = "https://api.scavio.dev/api/v1/search";
async function search(query, retries = 3) {
for (let i = 0; i < retries; i++) {
try {
const res = await fetch(ENDPOINT, {
method: "POST",
headers: { "x-api-key": API_KEY, "Content-Type": "application/json" },
body: JSON.stringify({ query, country_code: "us" })
});
if (!res.ok) throw new Error(`HTTP ${res.status}`);
return res.json();
} catch (e) {
if (i === retries - 1) throw e;
await new Promise(r => setTimeout(r, Math.pow(2, i) * 1000));
}
}
}
// No proxies, no HTML parsing
search("python tutorial").then(data => {
(data.organic_results || []).slice(0, 5).forEach(r => console.log(`${r.title}\n${r.link}\n`));
}).catch(console.error);Expected Output
Traditional approach: 47 lines, 3 dependencies, breaks monthly
Scavio approach: 8 lines, 1 dependency, stable
Sample output:
{
"organic_results": [
{ "position": 1, "title": "Python Tutorial — W3Schools", "link": "https://w3schools.com/python/" },
{ "position": 2, "title": "The Python Tutorial — Python Docs", "link": "https://docs.python.org/tutorial/" }
]
}