Les nœuds de requête HTTP n8n qui scrapent les sites web cassent constamment en raison des changements HTML, des CAPTCHAs et des limites de taux. Les remplacer par des appels API structurés renvoie un JSON propre, ne casse jamais en cas de changements de mise en page et élimine les coûts de proxy. Ce tutoriel migre les modèles de scraping n8n courants vers des appels API, nœud par nœud.
Prérequis
- Instance n8n en cours d'exécution
- Une clé API Scavio de scavio.dev
- Workflows n8n existants avec des nœuds de scraping HTTP
- Connaissances de base du workflow n8n
Parcours
Étape 1: Identifier les nœuds de scraping à remplacer
Exportez votre workflow n8n et trouvez les nœuds de requête HTTP qui scrapent des sites web.
import json, os, requests
API_KEY = os.environ['SCAVIO_API_KEY']
SH = {'x-api-key': API_KEY, 'Content-Type': 'application/json'}
# Analyze an n8n workflow export for scraping nodes
def find_scraping_nodes(workflow_json):
nodes = workflow_json.get('nodes', [])
scraping_nodes = []
for node in nodes:
if node.get('type') == 'n8n-nodes-base.httpRequest':
url = node.get('parameters', {}).get('url', '')
if any(site in url for site in ['google.com', 'amazon.com', 'reddit.com', 'bing.com']):
scraping_nodes.append({
'name': node.get('name', 'unnamed'),
'url': url,
'type': 'replaceable'
})
return scraping_nodes
# Simulated workflow analysis
sample = {'nodes': [
{'type': 'n8n-nodes-base.httpRequest', 'name': 'Scrape Google', 'parameters': {'url': 'https://google.com/search?q=test'}},
{'type': 'n8n-nodes-base.httpRequest', 'name': 'Scrape Amazon', 'parameters': {'url': 'https://amazon.com/s?k=test'}},
{'type': 'n8n-nodes-base.httpRequest', 'name': 'Internal API', 'parameters': {'url': 'https://api.mycompany.com/data'}},
]}
scrapers = find_scraping_nodes(sample)
print(f'Found {len(scrapers)} scraping nodes to replace:')
for s in scrapers:
print(f' {s["name"]}: {s["url"][:50]}')Étape 2: Créer la configuration du nœud API de remplacement
Générez des configurations de nœuds de requête HTTP n8n qui utilisent l'API de recherche à la place.
def generate_replacement_node(scraping_node):
"""Generate n8n node config for API replacement."""
url = scraping_node['url']
name = scraping_node['name']
# Determine platform from URL
platform = None
if 'google.com' in url: platform = None # default is Google
elif 'amazon.com' in url: platform = 'amazon'
elif 'reddit.com' in url: platform = 'reddit'
body = {'query': '{{ $json.query }}', 'country_code': 'us'}
if platform:
body['platform'] = platform
replacement = {
'name': f'{name} (API)',
'type': 'n8n-nodes-base.httpRequest',
'parameters': {
'method': 'POST',
'url': 'https://api.scavio.dev/api/v1/search',
'headers': {
'x-api-key': '{{ $env.SCAVIO_API_KEY }}',
'Content-Type': 'application/json'
},
'body': json.dumps(body),
'responseFormat': 'json'
}
}
return replacement
print('=== Replacement Nodes ===')
for s in scrapers:
replacement = generate_replacement_node(s)
print(f'\n{s["name"]} -> {replacement["name"]}')
print(f' URL: {replacement["parameters"]["url"]}')
print(f' Method: POST (was GET)')
print(f' Response: Clean JSON (was raw HTML)')Étape 3: Tester le remplacement et comparer la sortie
Exécutez les approches ancienne et nouvelle pour vérifier que la qualité des données correspond.
def compare_outputs(query, platform=None):
"""Compare API output quality for the replacement."""
body = {'query': query, 'country_code': 'us'}
if platform:
body['platform'] = platform
data = requests.post('https://api.scavio.dev/api/v1/search',
headers=SH, json=body).json()
results = data.get('organic_results', [])
print(f'\nQuery: "{query}" (platform: {platform or "google"})')
print(f' Results: {len(results)}')
print(f' Fields per result: {list(results[0].keys()) if results else "N/A"}')
if results:
print(f' Sample: {results[0].get("title", "")[:50]}')
print(f' Format: Structured JSON (no HTML parsing needed)')
print(f' Cost: $0.005 per query')
print(f' Reliability: No CAPTCHAs, no proxy needed, no HTML changes')
compare_outputs('wireless earbuds review')
compare_outputs('wireless earbuds', platform='amazon')
compare_outputs('wireless earbuds recommendation', platform='reddit')
print(f'\n=== Migration Summary ===')
print(f' Nodes to replace: {len(scrapers)}')
print(f' Time to migrate: ~10 minutes per node')
print(f' Monthly savings: proxy costs + maintenance time')Exemple Python
import os, requests
SH = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}
# Before: n8n HTTP scraping Google (breaks often)
# After: n8n HTTP Request to API (stable JSON)
data = requests.post('https://api.scavio.dev/api/v1/search',
headers=SH, json={'query': 'wireless earbuds', 'country_code': 'us'}).json()
print(f'Results: {len(data.get("organic_results", []))}')
print(f'Format: JSON | No HTML parsing | $0.005/query')Exemple JavaScript
const SH = { 'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json' };
const data = await fetch('https://api.scavio.dev/api/v1/search', {
method: 'POST', headers: SH,
body: JSON.stringify({ query: 'wireless earbuds', country_code: 'us' })
}).then(r => r.json());
console.log(`Results: ${(data.organic_results || []).length}`);
console.log('Format: JSON | No HTML parsing | $0.005/query');Sortie attendue
Found 2 scraping nodes to replace:
Scrape Google: https://google.com/search?q=test
Scrape Amazon: https://amazon.com/s?k=test
Query: "wireless earbuds review" (platform: google)
Results: 10
Fields per result: ['title', 'link', 'snippet', 'position']
Format: Structured JSON (no HTML parsing needed)
Cost: $0.005 per query
=== Migration Summary ===
Nodes to replace: 2
Time to migrate: ~10 minutes per node
Monthly savings: proxy costs + maintenance time