ecommercetestingdata-quality

Ecommerce Data API Quality Testing Guide

Test ecommerce API quality across five dimensions: freshness, coverage, accuracy, completeness, and latency. Automated test suite included.

8 min

Testing ecommerce data API quality requires checking five dimensions: result freshness (are prices current), coverage (does it return all major retailers), accuracy (do prices match the actual product page), structured completeness (are ratings, review counts, and availability included), and latency under load. Most teams skip this testing and discover data quality issues in production when customers complain.

The five quality dimensions

  • Freshness: prices should be no more than 24 hours old
  • Coverage: results from Amazon, Walmart, Target, Best Buy minimum
  • Accuracy: price in API matches price on product page
  • Completeness: title, price, rating, reviews, availability, image URL
  • Latency: under 2 seconds for 95th percentile

Automated quality test suite

Python
import os, requests, time, json

SCAVIO_KEY = os.environ["SCAVIO_API_KEY"]
HEADERS = {"x-api-key": SCAVIO_KEY}

def test_shopping_quality(query: str) -> dict:
    """Run quality checks on shopping search results."""
    start = time.time()
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers=HEADERS,
        json={"query": query, "search_type": "shopping", "num_results": 20},
    )
    latency = time.time() - start
    data = resp.json()
    results = data.get("shopping_results", [])

    # Coverage: count unique retailers
    retailers = set()
    for r in results:
        domain = r.get("source", "").lower()
        retailers.add(domain)

    # Completeness: check required fields
    complete = 0
    required_fields = ["title", "price", "link", "source"]
    for r in results:
        if all(r.get(f) for f in required_fields):
            complete += 1

    # Price sanity: flag outliers (< $1 or > 10x median)
    prices = []
    for r in results:
        price_str = str(r.get("price", "")).replace("$", "").replace(",", "")
        try:
            prices.append(float(price_str))
        except ValueError:
            pass

    median_price = sorted(prices)[len(prices) // 2] if prices else 0
    outliers = [p for p in prices if p < 1 or (median_price and p > 10 * median_price)]

    return {
        "query": query,
        "result_count": len(results),
        "unique_retailers": len(retailers),
        "retailers": list(retailers),
        "completeness_rate": complete / len(results) if results else 0,
        "price_outliers": len(outliers),
        "latency_seconds": round(latency, 3),
        "pass": (
            len(results) >= 5
            and len(retailers) >= 3
            and complete / max(len(results), 1) >= 0.8
            and latency < 2.0
        ),
    }

# Run against test queries
test_queries = [
    "wireless earbuds",
    "running shoes men",
    "mechanical keyboard",
    "protein powder",
]
for q in test_queries:
    result = test_shopping_quality(q)
    status = "PASS" if result["pass"] else "FAIL"
    print(f"[{status}] {q}: {result['result_count']} results, "
          f"{result['unique_retailers']} retailers, "
          f"{result['latency_seconds']}s")

Price accuracy verification

Python
import requests

def verify_price_accuracy(api_result: dict, sample_size: int = 3) -> dict:
    """Spot-check API prices against actual product pages."""
    results = api_result.get("shopping_results", [])[:sample_size]
    checks = []

    for r in results:
        # Use search to find the current price on the retailer site
        verification = requests.post(
            "https://api.scavio.dev/api/v1/search",
            headers={"x-api-key": os.environ["SCAVIO_API_KEY"]},
            json={"query": f"{r['title']} price site:{r.get('source', '')}",
                  "num_results": 1},
        ).json()

        checks.append({
            "product": r["title"][:50],
            "api_price": r.get("price"),
            "source": r.get("source"),
            "verification_snippet": verification.get("organic_results", [{}])[0].get("snippet", ""),
        })
    return {"checks": checks, "sample_size": sample_size}

Continuous monitoring setup

Run quality tests daily via cron. Track metrics over time in a simple JSON log. Alert on: completeness rate dropping below 80%, latency exceeding 3 seconds, result count dropping below 5 for any test query. This costs about 20 API calls per day (4 test queries x 5 quality checks) = $0.10/day with Scavio.

Comparing ecommerce API providers

  • Keepa (Amazon only): $19/mo for 100K tokens, deep price history
  • DataForSEO Shopping: $0.002/query live, multi-retailer
  • Scavio Shopping: $0.005/credit, Google Shopping results
  • SerpAPI Shopping: $0.015/search, Google Shopping only
  • Helium 10 (Amazon FBA): $39/mo Starter, proprietary metrics

Key takeaway

Test your ecommerce data API before building features on it. A 20-line quality test suite catches data issues before they reach production. Run it daily, track trends, and switch providers when quality drops below your threshold.