geminifallbackreliability

Gemini API 429/503 Fallback: Route Search to a Reliable API

Gemini rate limits and 503 errors break grounding. Decouple search from LLM by routing to a dedicated search API during outages.

6 min read

Gemini API returns 429 (rate limit) and 503 (service unavailable) errors frequently during peak hours. Production pipelines need a fallback strategy that routes search queries to an alternative API when Gemini's grounding search fails, not a retry loop that compounds the problem.

Gemini Rate Limit Reality

Gemini 2.5 Pro's free tier allows 5 RPM (requests per minute) and 25 RPD (requests per day). Even paid tiers hit 429s during high-traffic periods. The grounding with Google Search feature -- which powers Gemini's web-aware answers -- has its own separate rate limits that are not well documented.

When grounding fails, Gemini falls back to its training data. Your agent gets an answer, but it is potentially stale or hallucinated. Without explicit error handling, you cannot distinguish a grounded response from an ungrounded one.

The Fallback Pattern

Instead of retrying Gemini, route the search query to a dedicated search API. Feed the fresh results back into Gemini (or any LLM) as context. This decouples search reliability from LLM availability.

Python
import requests, os
import google.generativeai as genai

genai.configure(api_key=os.environ["GEMINI_API_KEY"])
H = {"x-api-key": os.environ["SCAVIO_API_KEY"]}

def search_with_fallback(query):
    """Try Gemini grounding first, fall back to Scavio."""
    try:
        model = genai.GenerativeModel("gemini-2.5-pro")
        response = model.generate_content(
            query,
            tools=[{"google_search": {}}],
        )
        # Check if grounding actually worked
        if hasattr(response, "candidates"):
            grounding = response.candidates[0].grounding_metadata
            if grounding and grounding.search_entry_point:
                return {"source": "gemini_grounding", "response": response.text}
    except Exception as e:
        if "429" in str(e) or "503" in str(e):
            print(f"Gemini rate limited: {e}")
        else:
            raise

    # Fallback: search directly, then use Gemini without grounding
    search_r = requests.post("https://api.scavio.dev/api/v1/search",
        headers=H,
        json={"platform": "google", "query": query},
        timeout=10
    ).json()
    context = "\n".join([
        f"- {r['title']}: {r.get('snippet', '')}"
        for r in search_r.get("organic", [])[:5]
    ])
    model = genai.GenerativeModel("gemini-2.5-flash")
    response = model.generate_content(
        f"Based on these search results:\n{context}\n\nAnswer: {query}"
    )
    return {"source": "scavio_fallback", "response": response.text}

result = search_with_fallback("latest google ai announcements 2026")
print(f"Source: {result['source']}")

503 Service Unavailable

Gemini 503 errors indicate the service is overloaded or undergoing maintenance. Unlike 429s (which resolve after the rate limit window), 503s can last minutes or hours. Your fallback should handle extended outages, not just momentary rate limits.

Python
import time

def resilient_search(query, max_gemini_wait=5):
    """Fast fallback: don't wait for Gemini recovery."""
    start = time.time()
    try:
        model = genai.GenerativeModel("gemini-2.5-pro")
        response = model.generate_content(
            query,
            tools=[{"google_search": {}}],
            request_options={"timeout": max_gemini_wait}
        )
        return {"source": "gemini", "data": response.text}
    except Exception:
        elapsed = time.time() - start
        print(f"Gemini failed after {elapsed:.1f}s, using fallback")

    # Direct search API -- reliable, no rate limit surprises
    r = requests.post("https://api.scavio.dev/api/v1/search",
        headers=H,
        json={"platform": "google", "query": query},
        timeout=10
    ).json()
    return {
        "source": "scavio",
        "data": r.get("organic", []),
    }

Monitoring Fallback Rates

Track how often your pipeline falls back. If Gemini grounding fails more than 10% of the time, you are paying for grounding you are not getting. At that point, move search to a dedicated API permanently and use Gemini purely for language generation.

Cost Impact

Gemini grounding charges per grounding request on top of token costs. Scavio charges $0.005/credit (250 free/month, $30/month for 7K). A dedicated search API with Gemini for generation often costs less than Gemini with grounding, especially when grounding fails and you retry.