agentsproductionreliability

Agent Search Reliability in Production

DuckDuckGo rate-limits, Brave paywalled, free tools fail under load. Building reliable search infrastructure for production AI agents.

May 9, 2026

6 min read

Production AI agents need search that works on the 1,000th request as reliably as the first. DuckDuckGo rate-limits you after a few dozen queries. Brave's free tier caps out quickly. Scraping Google directly gets your IP blocked. The search tool that works perfectly during development is almost never the one that survives production traffic.

How agents fail in production

The failure pattern is consistent across frameworks -- LangChain, CrewAI, AutoGen, custom loops. The agent works in testing because testing means 5-10 searches. Production means hundreds or thousands per day, and that is where free or semi-free search tools break.

DuckDuckGo: No official API. Libraries like duckduckgo-search scrape the site. Works for 20-30 queries, then returns empty results or CAPTCHA pages. Common in LangChain tutorials, unusable in production.
Google Custom Search: 100 free queries/day, then $5 per 1,000. Decent quality but the 100-query ceiling means a moderately active agent exhausts it before lunch.
Brave Search API free tier: 2,000 queries/month. Good quality, independent index. But 2,000 queries is about 65/day -- enough for light use, not enough for a multi-user product.
Direct scraping: Works until Google, Bing, or whatever engine you are scraping detects the pattern. Then you get 403s, CAPTCHAs, and IP bans. Recovery takes hours to days.

What production reliability actually means

Reliability for agent search has specific requirements that differ from general API reliability:

No silent degradation: A search that returns empty results without an error code is worse than a search that fails loudly. The agent interprets empty results as "nothing exists" and produces confidently wrong output.
Consistent latency: Agents are latency-sensitive because search is usually one step in a multi-step chain. A search that takes 200ms 95% of the time but 10 seconds 5% of the time causes unpredictable agent behavior and timeouts.
Predictable rate limits: The agent needs to know before calling search whether it will succeed. Token budgets, credit balances, and explicit rate-limit headers let the agent plan. Opaque rate limits that return 429 without retry-after information cause retry storms.
Structured error responses: The LLM driving the agent reads error messages. A JSON error with a clear message is actionable. An HTML error page is noise.

Building a resilient search layer

The production pattern is a search wrapper that handles failures gracefully and gives the agent enough information to decide what to do next:

Python

import requests, os, time
from dataclasses import dataclass

@dataclass
class SearchResult:
    results: list[dict]
    success: bool
    error: str | None = None
    credits_remaining: int | None = None

def reliable_search(
    query: str,
    num_results: int = 5,
    timeout: int = 10,
    max_retries: int = 2,
) -> SearchResult:
    """Production search wrapper with retry and error context."""
    api_key = os.environ.get("SCAVIO_API_KEY")
    if not api_key:
        return SearchResult([], False, "SCAVIO_API_KEY not set")

    for attempt in range(max_retries + 1):
        try:
            resp = requests.post(
                "https://api.scavio.dev/api/v1/search",
                headers={"x-api-key": api_key},
                json={"query": query, "num_results": num_results},
                timeout=timeout,
            )
            if resp.status_code == 200:
                data = resp.json()
                return SearchResult(
                    results=data.get("results", []),
                    success=True,
                    credits_remaining=data.get("credits_remaining"),
                )
            if resp.status_code == 429:
                wait = min(2 ** attempt, 8)
                time.sleep(wait)
                continue
            return SearchResult(
                [], False, f"HTTP {resp.status_code}: {resp.text[:200]}"
            )
        except requests.Timeout:
            if attempt < max_retries:
                continue
            return SearchResult([], False, "Search timed out after retries")
        except requests.ConnectionError:
            return SearchResult([], False, "Connection failed")

    return SearchResult([], False, "Max retries exceeded")

Integrating with agent frameworks

The wrapper above returns structured data that any agent framework can consume. Here is how to expose it as a tool:

Python

def search_tool(query: str) -> str:
    """Search the web. Returns JSON with results or an error message."""
    result = reliable_search(query, num_results=5)
    if not result.success:
        # Give the LLM a clear instruction, not just an error
        return (
            f"Search failed: {result.error}. "
            "Do not retry this query. Use available context instead."
        )
    if result.credits_remaining is not None and result.credits_remaining < 10:
        prefix = f"Warning: {result.credits_remaining} credits remaining. "
    else:
        prefix = ""
    summaries = []
    for r in result.results:
        summaries.append(f"- {r['title']}: {r.get('snippet', 'No snippet')}")
    return prefix + "\n".join(summaries)

Monitoring agent search in production

You need three metrics to catch search issues before users notice:

Search success rate: Percentage of search calls that return at least one result. Alert if it drops below 95%.
p95 latency: 95th percentile response time. For agent search, keep this under 3 seconds. Above that, the agent starts timing out or the user experience degrades.
Credits consumed per session: Track how many search calls each agent session makes. A sudden spike means a loop or retry storm. Set a hard cap (e.g., 50 searches per session) and terminate gracefully when hit.

Python

import logging

logger = logging.getLogger("agent_search")

class SearchMonitor:
    def __init__(self, max_per_session: int = 50):
        self.max_per_session = max_per_session
        self.call_count = 0
        self.failures = 0

    def search(self, query: str) -> SearchResult:
        if self.call_count >= self.max_per_session:
            return SearchResult(
                [], False,
                "Session search limit reached. Summarize with available data."
            )
        self.call_count += 1
        result = reliable_search(query)
        if not result.success:
            self.failures += 1
            logger.warning(
                "Search failure %d/%d: %s",
                self.failures, self.call_count, result.error,
            )
        return result

Cost comparison at production scale

For an agent product serving 100 users/day, each triggering ~10 searches: 1,000 searches/day, 30,000/month.

Google Custom Search: 30,000 queries at $5/1,000 = $150/mo after the 100 free/day.
SerpAPI: 30,000 at $0.01/query = $300/mo (Business plan).
Scavio: 30,000 at $0.005/credit = $150/mo, or $30/mo for the 7,000-credit plan if you can stay within it.
DuckDuckGo: $0/mo in theory. In practice, broken at this volume.

The real lesson

The search tool in your demo is not the search tool for your product. Plan for the production search layer from the start -- pick a tool with explicit rate limits, structured errors, and per-query pricing. Migrating search providers mid-production, while your agents are serving users, is a fire drill you can avoid.

Agent Search Reliability in Production

How agents fail in production

What production reliability actually means

Building a resilient search layer

Integrating with agent frameworks

Monitoring agent search in production

Cost comparison at production scale

The real lesson

Continue reading

Connect Scavio to Any AI Assistant with MCP

Build a Cross-Platform Product Research Agent with LangGraph