rate-limitsscrapingapi

Web Search and Scraping Rate Limit Workarounds

Practical workarounds for web search and scraping rate limits -- caching, batching, and choosing the right provider.

7 min read

Rate limits are the most common blocker when building applications on top of web search data. Whether you are scraping directly or using a search API, you will eventually hit request caps, throttling, or outright blocks. This post covers practical strategies for working within rate limits without sacrificing data freshness or application reliability.

Understanding Why Rate Limits Exist

Search engines and APIs impose rate limits for infrastructure protection and fair usage. Google blocks aggressive scraping to protect server resources. Search APIs impose per-minute or per-day caps to manage costs and prevent abuse. Understanding the type of rate limit you are hitting determines the right workaround.

  • Hard caps -- a fixed number of requests per time period, enforced by the API (returns 429)
  • Soft throttling -- requests slow down or degrade in quality after a threshold
  • IP-based blocking -- the source IP is banned after suspicious patterns (scraping)
  • Credit-based limits -- you have a pool of credits that deplete with usage

Caching Aggressively

The cheapest request is the one you never make. Many search queries repeat -- the same product lookup, the same competitor monitoring query, the same keyword check. Caching results locally eliminates redundant API calls.

Python
import hashlib
import json
import time

CACHE = {}
CACHE_TTL = 3600  # 1 hour

def cached_search(query, platform="google"):
    key = hashlib.md5(f"{platform}:{query}".encode()).hexdigest()

    if key in CACHE and time.time() - CACHE[key]["ts"] < CACHE_TTL:
        return CACHE[key]["data"]

    response = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": API_KEY},
        json={"platform": platform, "query": query}
    )
    result = response.json()

    CACHE[key] = {"data": result, "ts": time.time()}
    return result

For production, use Redis or Memcached instead of an in-memory dict. Set TTL based on how fresh the data needs to be -- product prices might need 15-minute freshness, while informational queries can cache for hours.

Request Queuing and Throttling

Instead of firing requests as fast as possible, queue them and process at a controlled rate. This prevents burst-triggered rate limits and distributes load evenly:

Python
import asyncio
from collections import deque

class RateLimiter:
    def __init__(self, max_per_second=2):
        self.delay = 1.0 / max_per_second
        self.last_call = 0

    async def acquire(self):
        now = asyncio.get_event_loop().time()
        wait = self.last_call + self.delay - now
        if wait > 0:
            await asyncio.sleep(wait)
        self.last_call = asyncio.get_event_loop().time()

This pattern ensures you never exceed your API's per-second limit, even under bursty load from multiple concurrent users.

Tiered Data Freshness

Not all data needs to be real-time. Categorize your queries by freshness requirements:

  • Real-time -- price comparisons, stock availability (cache 5-15 min)
  • Near-real-time -- news monitoring, trend tracking (cache 1-4 hours)
  • Daily -- SEO rank tracking, competitor analysis (cache 24 hours)
  • Weekly -- market research, content audits (cache 7 days)

By assigning appropriate cache TTLs to each category, you can reduce API usage by 60-80% without meaningful data staleness.

Handling 429 Responses Gracefully

When you do hit a rate limit, handle it with exponential backoff rather than immediate retries. Hammering a rate-limited endpoint makes the situation worse:

Python
import time

def search_with_retry(query, max_retries=3):
    for attempt in range(max_retries):
        response = requests.post(API_URL, headers=HEADERS, json={"query": query})

        if response.status_code == 429:
            wait = 2 ** attempt  # 1s, 2s, 4s
            time.sleep(wait)
            continue

        return response.json()

    raise Exception("Rate limit exceeded after retries")

Scaling Beyond Single-API Limits

If your application genuinely needs more throughput than a single API plan provides, consider these approaches before adding complexity:

  • Upgrade to a higher API tier -- the cost per request usually drops at higher volumes
  • Pre-compute and store results for predictable queries during off-peak hours
  • Use light mode instead of full mode for queries that only need titles and snippets
  • Deduplicate queries at the application layer before they reach the API

Rate limits are a constraint, not a wall. With caching, queuing, and smart prioritization, most applications can operate comfortably within standard API limits.