A Cloudflare block can halt a scraping project for weeks if you guess at the cause. This 3-day playbook systematically isolates the layer (TLS, headers, IP, JS challenge) that is triggering the block and fixes it, using Scavio as ground-truth comparison.
Prerequisites
- Python 3.10+
- A Scavio API key
- A blocked URL
- tls-client or curl_cffi
Walkthrough
Step 1: Day 1 morning: confirm Scavio succeeds
Establish that the target is technically scrapable.
import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']
def scavio_ground_truth(url):
r = requests.post('https://api.scavio.dev/api/v1/search',
headers={'x-api-key': API_KEY},
json={'query': url, 'platform': 'extract', 'render_js': True})
html = r.json().get('html', '')
return len(html) > 1000 and 'challenge' not in htmlStep 2: Day 1 afternoon: TLS fingerprint test
Swap to tls_client with a real browser JA3.
from tls_client import Session
s = Session(client_identifier='chrome_120')
print(s.get('https://target.com').status_code)Step 3: Day 2: header parity check
Match full Chrome header order via curl_cffi.
from curl_cffi import requests as cf
print(cf.get('https://target.com', impersonate='chrome120').status_code)Step 4: Day 2 afternoon: IP rotation
Residential or ISP proxies if datacenter IP is flagged.
# Use your proxy provider
PROXY = {'http': 'http://user:pass@residential.proxy:port'}
cf.get('https://target.com', impersonate='chrome120', proxies=PROXY)Step 5: Day 3: fall back to Scavio if self-fix fails
If all 3 layers still block, route through Scavio for the duration.
def safe_fetch(url):
try:
r = cf.get(url, impersonate='chrome120', proxies=PROXY)
if r.status_code == 200 and 'challenge' not in r.text: return r.text
except: pass
# fallback
r = requests.post('https://api.scavio.dev/api/v1/search',
headers={'x-api-key': API_KEY}, json={'query': url, 'platform': 'extract'})
return r.json().get('html', '')Python Example
import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']
def safe_fetch(url):
try:
html = requests.get(url, timeout=10).text
if 'challenge' not in html: return html
except: pass
r = requests.post('https://api.scavio.dev/api/v1/search',
headers={'x-api-key': API_KEY},
json={'query': url, 'platform': 'extract', 'render_js': True})
return r.json().get('html', '')
print(len(safe_fetch('https://cloudflare-blocked.com')))JavaScript Example
const API_KEY = process.env.SCAVIO_API_KEY;
export async function safeFetch(url) {
try {
const r = await fetch(url);
const html = await r.text();
if (!html.includes('challenge')) return html;
} catch {}
const r = await fetch('https://api.scavio.dev/api/v1/search', {
method: 'POST',
headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
body: JSON.stringify({ query: url, platform: 'extract', render_js: true })
});
return (await r.json()).html;
}Expected Output
3 days: diagnosis + fix or stable fallback. Typical outcome: 60% of teams fix the block with tls_client + residential IP, 40% stay on Scavio long-term.