ScavioScavio
产品定价文档
登录开始使用
  1. 首页
  2. 教程
  3. 如何用结构化 API 替换 ScrapingAnt
教程

如何用结构化 API 替换 ScrapingAnt

从 ScrapingAnt 网络抓取迁移到适用于 Amazon、Google 和 Reddit 数据的 Scavio 结构化 API。并排代码比较。

获取免费API密钥API文档

ScrapingAnt 返回原始 HTML,您必须自己解析该 HTML,并且在爱好者级别花费 19 美元/月获得 100K 积分。对于 Amazon、Google 和 Reddit 等常见目标,结构化 API 直接返回解析后的 JSON,从而消除了 BeautifulSoup 解析代码并减少了页面布局更改时的维护。本教程展示了三个最常见的 ScrapingAnt 用例的并行迁移。

前置条件

  • Python 3.8+
  • 请求库
  • 来自 scavio.dev 的 Scavio API 密钥
  • 要迁移的现有 ScrapingAnt 集成

操作指南

步骤 1: 并排比较这些方法

了解 ScrapingAnt 原始 HTML 与结构化 API JSON 有何不同。

Python
import os, requests
from bs4 import BeautifulSoup

# --- ScrapingAnt approach (raw HTML) ---
# SA_KEY = os.environ.get('SCRAPINGANT_KEY', '')
# def scrape_google_sa(query):
#     r = requests.get(f'https://api.scrapingant.com/v2/general',
#         params={'url': f'https://www.google.com/search?q={query}', 'x-api-key': SA_KEY})
#     soup = BeautifulSoup(r.text, 'html.parser')
#     results = []
#     for div in soup.select('div.g'):
#         title = div.select_one('h3')
#         link = div.select_one('a')
#         if title and link:
#             results.append({'title': title.text, 'link': link['href']})
#     return results  # Fragile: breaks when Google changes HTML

# --- Structured API approach (parsed JSON) ---
API_KEY = os.environ['SCAVIO_API_KEY']
SH = {'x-api-key': API_KEY, 'Content-Type': 'application/json'}

def search_google(query):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': query, 'country_code': 'us'}).json()
    return data.get('organic_results', [])  # Stable JSON, no parsing needed

results = search_google('best serp api 2026')
for r in results[:3]:
    print(f'{r["position"]}. {r["title"][:50]} - {r["link"][:40]}')
print(f'\nNo HTML parsing. No BeautifulSoup. No CSS selectors.')

步骤 2: 迁移亚马逊产品搜索

用结构化产品数据替换 ScrapingAnt 亚马逊抓取。

Python
# --- ScrapingAnt Amazon (before) ---
# def scrape_amazon_sa(query):
#     r = requests.get('https://api.scrapingant.com/v2/general',
#         params={'url': f'https://www.amazon.com/s?k={query}', 'x-api-key': SA_KEY})
#     soup = BeautifulSoup(r.text, 'html.parser')
#     products = []
#     for item in soup.select('[data-component-type="s-search-result"]'):
#         title = item.select_one('h2 span')
#         price = item.select_one('.a-price .a-offscreen')
#         # ... 20+ lines of fragile selector parsing
#     return products

# --- Structured API (after) ---
def search_amazon(query):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': query, 'platform': 'amazon', 'country_code': 'us'}).json()
    return data.get('organic_results', [])

products = search_amazon('wireless earbuds')
for p in products[:3]:
    print(f'{p.get("title", "")[:50]} | {p.get("price", "N/A")} | {p.get("rating", "N/A")}')
print(f'\n3 lines vs 20+ lines of HTML parsing. Cost: $0.005/query.')

步骤 3: 迁移 Reddit 数据提取

用结构化 Reddit 搜索替换 ScrapingAnt Reddit 抓取。

Python
# --- ScrapingAnt Reddit (before) ---
# def scrape_reddit_sa(query):
#     r = requests.get('https://api.scrapingant.com/v2/general',
#         params={'url': f'https://www.reddit.com/search/?q={query}', 'x-api-key': SA_KEY,
#                 'browser': 'true'})  # Reddit needs JS rendering = more credits
#     soup = BeautifulSoup(r.text, 'html.parser')
#     # Reddit HTML changes frequently, selectors break monthly
#     return posts

# --- Structured API (after) ---
def search_reddit(query):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': query, 'platform': 'reddit', 'country_code': 'us'}).json()
    return data.get('organic_results', [])

posts = search_reddit('best api for web scraping')
for p in posts[:3]:
    print(f'{p.get("title", "")[:60]}')
    print(f'  {p.get("snippet", "")[:80]}')
print(f'\nNo JS rendering needed. No browser credits. $0.005/query.')

步骤 4: 比较成本和维护

计算节省的成本和减少的维护负担。

Python
def cost_comparison(monthly_queries):
    # ScrapingAnt: $19/mo for 100K credits
    # Google search = 10 credits, Amazon = 10, Reddit w/ browser = 20
    sa_google = monthly_queries * 10 / 100000 * 19
    sa_amazon = monthly_queries * 10 / 100000 * 19
    sa_reddit = monthly_queries * 20 / 100000 * 19  # JS rendering doubles credits

    # Scavio: $0.005/query flat
    sc_cost = monthly_queries * 0.005

    print(f'Monthly cost comparison ({monthly_queries:,} queries/platform):')
    print(f'  ScrapingAnt Google:  ${sa_google:.2f} (+ parsing maintenance)')
    print(f'  ScrapingAnt Amazon:  ${sa_amazon:.2f} (+ parsing maintenance)')
    print(f'  ScrapingAnt Reddit:  ${sa_reddit:.2f} (+ JS rendering cost)')
    print(f'  Scavio (all three):  ${sc_cost * 3:.2f} (structured JSON, no parsing)')
    print(f'\nLines of parsing code eliminated: ~60-100 (BeautifulSoup selectors)')
    print(f'Maintenance: 0 selector updates vs monthly fixes when layouts change')

cost_comparison(1000)
cost_comparison(5000)

Python 示例

Python
import os, requests
SH = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}

# Replace ScrapingAnt for Google, Amazon, Reddit in 3 lines each:
def search(query, platform=None):
    body = {'query': query, 'country_code': 'us'}
    if platform: body['platform'] = platform
    data = requests.post('https://api.scavio.dev/api/v1/search', headers=SH, json=body).json()
    return data.get('organic_results', [])

for p in [None, 'amazon', 'reddit']:
    results = search('wireless earbuds', p)
    print(f'{p or "google"}: {len(results)} results ($0.005, no HTML parsing)')

JavaScript 示例

JavaScript
const SH = { 'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json' };
async function search(query, platform) {
  const body = { query, country_code: 'us' };
  if (platform) body.platform = platform;
  const data = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: SH, body: JSON.stringify(body)
  }).then(r => r.json());
  return data.organic_results || [];
}
for (const p of [null, 'amazon', 'reddit']) {
  const results = await search('wireless earbuds', p);
  console.log(`${p || 'google'}: ${results.length} results ($0.005, no parsing)`);
}

预期输出

JSON
1. Scavio - Search API for Developers - https://scavio.dev
2. SerpAPI - Google Search API - https://serpapi.com
3. DataForSEO - SEO Data API - https://dataforseo.com

No HTML parsing. No BeautifulSoup. No CSS selectors.

Monthly cost comparison (1,000 queries/platform):
  ScrapingAnt Google:  $1.90 (+ parsing maintenance)
  ScrapingAnt Amazon:  $1.90 (+ parsing maintenance)
  ScrapingAnt Reddit:  $3.80 (+ JS rendering cost)
  Scavio (all three):  $15.00 (structured JSON, no parsing)

Lines of parsing code eliminated: ~60-100

相关教程

  • 如何从 Web Scraper 迁移到结构化 API
  • 如何构建没有验证码问题的数据管道
  • 如何用结构化 API 替换浏览器自动化

常见问题

大多数开发者在15到30分钟内完成本教程。您需要一个Scavio API密钥(免费套餐即可)和可用的Python或JavaScript环境。

Python 3.8+. 请求库. 来自 scavio.dev 的 Scavio API 密钥. 要迁移的现有 ScrapingAnt 集成. Scavio API密钥注册即送50个免费积分。

可以。免费套餐注册即送50个积分,完全足够完成本教程并构建一个可运行的原型解决方案。

Scavio提供原生LangChain包(langchain-scavio)、MCP服务器以及适用于任何HTTP客户端的REST API。本教程使用 the raw REST API, 但您可以根据需要适配您选择的框架。

相关资源

Best Of

Google I/O 2026 AI模式变化后最佳搜索API

Read more
Glossary

搜索 API 供应商格局(2026)

Read more
Best Of

2026 年最佳 SERP API 提供商按价格排名

Read more
Use Case

ScrapingAnt API 迁移

Read more
Glossary

免费搜索API层级对比

Read more
Comparison

Search APIs (Scavio, Tavily, SerpAPI) vs Headless Browser (Playwright, Puppeteer, Browserbase)

Read more

开始构建

从 ScrapingAnt 网络抓取迁移到适用于 Amazon、Google 和 Reddit 数据的 Scavio 结构化 API。并排代码比较。

获取免费API密钥阅读文档
ScavioScavio

面向AI智能体的实时搜索API。搜索所有平台,不仅仅是Google。

产品

  • 功能
  • 定价
  • 控制台
  • 联盟计划

开发者

  • 文档
  • API参考
  • 快速开始
  • MCP集成
  • Python SDK

替代方案

  • Tavily替代方案
  • SerpAPI替代方案
  • Firecrawl替代方案
  • Exa替代方案

工具

  • JSON格式化
  • cURL转代码
  • Token计数器
  • 全部工具

© 2026 Scavio. 保留所有权利。

Featured on TAAFT
服务条款隐私政策