Tutorial

How to Pull Ecommerce Data from Multiple Marketplaces

Query Amazon and Walmart product data in one pipeline using the Scavio API. Normalize schemas, compare prices, and set up price drop alerts.

Pull product data from Amazon and Walmart through a single API by querying both platforms with the same product search, normalizing the different response schemas into a unified format, comparing prices across marketplaces, and setting up alerts when price differences exceed a threshold. Running separate scrapers for each marketplace means maintaining two codebases, two proxy pools, and two parsing pipelines. A multi-platform search API reduces this to one codebase with a platform parameter.

Prerequisites

  • Python 3.8+ installed
  • requests library installed
  • A Scavio API key from scavio.dev
  • A list of products to track across marketplaces

Walkthrough

Step 1: Define products to track

Set up the product list and the marketplaces to query for each product.

Python
import os, requests, json, re

API_KEY = os.environ['SCAVIO_API_KEY']

PRODUCTS = [
    'Sony WH-1000XM5 headphones',
    'Apple AirPods Pro 2',
    'Samsung Galaxy S24 Ultra case',
]
PLATFORMS = ['amazon', 'walmart']

Step 2: Query both platforms

Search for each product on both Amazon and Walmart through Scavio's multi-platform API.

Python
def search_product(product: str, platform: str) -> list:
    resp = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'platform': platform, 'query': product}, timeout=15)
    resp.raise_for_status()
    return resp.json().get('organic_results', [])

def search_all_platforms(product: str) -> dict:
    results = {}
    for platform in PLATFORMS:
        results[platform] = search_product(product, platform)
        print(f'{platform}: {len(results[platform])} results for "{product[:30]}"')
    return results

all_results = search_all_platforms(PRODUCTS[0])

Step 3: Normalize the schema

Map Amazon and Walmart response fields to a common schema so downstream code does not need platform-specific logic.

Python
def parse_price(price_str: str) -> float:
    if not price_str:
        return 0.0
    cleaned = re.sub(r'[^\d.]', '', str(price_str))
    try:
        return float(cleaned)
    except ValueError:
        return 0.0

def normalize(result: dict, platform: str) -> dict:
    return {
        'platform': platform,
        'title': result.get('title', ''),
        'price': parse_price(result.get('price', '')),
        'price_raw': result.get('price', ''),
        'rating': result.get('rating', ''),
        'url': result.get('link', ''),
        'image': result.get('thumbnail', result.get('image', '')),
    }

def normalize_all(results: dict) -> list:
    normalized = []
    for platform, items in results.items():
        for item in items[:3]:
            normalized.append(normalize(item, platform))
    return normalized

normalized = normalize_all(all_results)
for n in normalized:
    print(f"{n['platform']:<10} ${n['price']:<8} {n['title'][:50]}")

Step 4: Compare prices across marketplaces

Find the best price for each product across Amazon and Walmart.

Python
def compare_prices(product: str) -> dict:
    results = search_all_platforms(product)
    normalized = normalize_all(results)
    if not normalized:
        return {'product': product, 'best': None}
    priced = [n for n in normalized if n['price'] > 0]
    if not priced:
        return {'product': product, 'best': None, 'note': 'no prices found'}
    best = min(priced, key=lambda x: x['price'])
    worst = max(priced, key=lambda x: x['price'])
    savings = worst['price'] - best['price']
    return {
        'product': product,
        'best': {'platform': best['platform'], 'price': best['price'], 'title': best['title'][:50]},
        'worst': {'platform': worst['platform'], 'price': worst['price']},
        'savings': round(savings, 2),
    }

comp = compare_prices(PRODUCTS[0])
print(f"Best: {comp['best']['platform']} ${comp['best']['price']} (save ${comp['savings']})")

Step 5: Alert on price differences

Set up alerts when the price difference between marketplaces exceeds a threshold.

Python
PRICE_DIFF_THRESHOLD = 10.0  # dollars

def check_alerts(products: list) -> list:
    alerts = []
    for product in products:
        comp = compare_prices(product)
        if comp.get('savings', 0) >= PRICE_DIFF_THRESHOLD:
            alert = {
                'product': product,
                'best_platform': comp['best']['platform'],
                'best_price': comp['best']['price'],
                'savings': comp['savings'],
            }
            alerts.append(alert)
            print(f'ALERT: {product} - save ${alert["savings"]} on {alert["best_platform"]}')
    if not alerts:
        print('No significant price differences found')
    return alerts

check_alerts(PRODUCTS)

Python Example

Python
import requests, os, re
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def compare(product):
    prices = {}
    for platform in ['amazon', 'walmart']:
        data = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
            json={'platform': platform, 'query': product}).json()
        top = (data.get('organic_results', []) or [{}])[0]
        price = re.sub(r'[^\d.]', '', top.get('price', '0'))
        prices[platform] = float(price) if price else 0
    return prices

print(compare('Sony WH-1000XM5'))

JavaScript Example

JavaScript
const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
async function compare(product) {
  const prices = {};
  for (const platform of ['amazon', 'walmart']) {
    const r = await fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST', headers: H, body: JSON.stringify({platform, query: product})
    });
    const top = ((await r.json()).organic_results || [])[0] || {};
    prices[platform] = parseFloat((top.price || '0').replace(/[^\d.]/g, '')) || 0;
  }
  return prices;
}
compare('Sony WH-1000XM5').then(console.log);

Expected Output

JSON
A unified pipeline that queries Amazon and Walmart for the same products, normalizes the data into a common schema, and alerts on cross-marketplace price differences.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.8+ installed. requests library installed. A Scavio API key from scavio.dev. A list of products to track across marketplaces. A Scavio API key gives you 250 free credits per month.

Yes. The free tier includes 250 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Query Amazon and Walmart product data in one pipeline using the Scavio API. Normalize schemas, compare prices, and set up price drop alerts.