n8n scraping workflows fail whenever a target site changes its HTML structure or blocks your IP. Proxy rotation logic adds complexity without solving the root problem: you are parsing unstructured HTML that can change at any time. A structured search API returns clean JSON for Google, Reddit, Amazon, and other platforms through a single POST request. This tutorial shows how to replace brittle n8n HTTP Request and HTML Extract nodes with a Scavio API call that returns structured data every time. The result is a simpler, more reliable n8n workflow with zero proxy management.
Prerequisites
- n8n instance running (cloud or self-hosted)
- A Scavio API key from scavio.dev
- An existing n8n workflow with HTTP scraping nodes
Walkthrough
Step 1: Identify failing scraping nodes
Find HTTP Request nodes that return 403/429 errors or HTML Extract nodes that produce empty output due to DOM changes.
# Common failure patterns in n8n scraping:
# - HTTP Request returns 403 Forbidden or 429 Too Many Requests
# - HTML Extract yields empty results after site redesign
# - Proxy list expires or gets blocked
# - Rate limiting forces artificial delays between requestsStep 2: Replace with Scavio API node
Configure an HTTP Request node pointing to the Scavio API instead of the target website.
# n8n HTTP Request node configuration:
# Method: POST
# URL: https://api.scavio.dev/api/v1/search
# Authentication: Header Auth
# Name: x-api-key
# Value: {{$credentials.scavioApiKey}}
# Body (JSON):
# platform: google
# query: {{ $json.search_term }}Step 3: Remove HTML parsing nodes
Delete the HTML Extract and regex parsing nodes since the API returns structured JSON directly.
# Before (fragile):
# HTTP Request -> HTML Extract -> Set (parse) -> Output
#
# After (reliable):
# HTTP Request (Scavio API) -> Output
#
# Access fields directly:
# {{ $json.organic_results[0].title }}
# {{ $json.organic_results[0].link }}
# {{ $json.organic_results[0].snippet }}Step 4: Test the replacement with Python
Verify the API returns the data you need before updating n8n.
import os, requests
API_KEY = os.environ["SCAVIO_API_KEY"]
resp = requests.post("https://api.scavio.dev/api/v1/search",
headers={"x-api-key": API_KEY},
json={"platform": "google", "query": "best crm software 2026"})
data = resp.json()
for r in data.get("organic_results", [])[:3]:
print(f"{r['title']}: {r['link']}")Python Example
import os, requests
API_KEY = os.environ["SCAVIO_API_KEY"]
resp = requests.post("https://api.scavio.dev/api/v1/search",
headers={"x-api-key": API_KEY},
json={"platform": "google", "query": "best crm software 2026"})
for r in resp.json().get("organic_results", [])[:5]:
print(r["title"], r["link"])JavaScript Example
const r = await fetch("https://api.scavio.dev/api/v1/search", {
method: "POST",
headers: {"x-api-key": process.env.SCAVIO_API_KEY, "Content-Type": "application/json"},
body: JSON.stringify({platform: "google", query: "best crm software 2026"})
});
const data = await r.json();
(data.organic_results || []).slice(0, 5).forEach(r => console.log(r.title, r.link));Expected Output
A simplified n8n workflow that fetches structured search data from a single API node, replacing all scraping, proxy, and HTML parsing nodes.