agentsreliabilitypipelines

Why AI Tools Break on Second Use: Building Reliable Pipelines

AI tools work on first demo and fail on second use. Stale data, rate limits, context overflow. Patterns for reliable production pipelines.

5 min read

Most AI tools work perfectly on the first demo call and break on the second real use. The failure pattern is consistent: stale data, hallucinated results, rate limits, and context window overflow. Building reliable pipelines means designing for the second call, not the first.

Why First Use Works and Second Use Fails

First use: fresh context window, no accumulated errors, often a curated demo query. Second use: context carries stale state, the LLM hallucinates based on prior conversation, rate limits kick in because you already burned your free credits, and the tool returns cached data from the first call instead of fresh results.

The core issue is that AI tools are stateless by design but used in stateful contexts. Your agent remembers the first search result and assumes it is still true on the second run.

Pattern 1: Stale Data Accumulation

Agents cache tool results in their context window. On subsequent calls, the LLM may reference cached data instead of making a fresh API call. Force fresh lookups by clearing tool result caches between runs.

Python
import requests, os
from datetime import datetime

H = {"x-api-key": os.environ["SCAVIO_API_KEY"]}

def fresh_search(query, platform="google"):
    """Always returns fresh results, never cached."""
    r = requests.post("https://api.scavio.dev/api/v1/search",
        headers=H,
        json={"platform": platform, "query": query},
        timeout=10
    ).json()
    return {
        "results": r.get("organic", []),
        "fetched_at": datetime.utcnow().isoformat(),
        "is_cached": False,
    }

# Every call hits the API -- no stale data
result = fresh_search("latest ai news today")
print(f"Fetched at: {result['fetched_at']}")

Pattern 2: Rate Limit Cascades

Free tiers hit rate limits fast. Your first 10 queries work, query 11 returns a 429, and your pipeline crashes because you did not handle the error. Build retry logic with exponential backoff from day one.

Python
import time

def search_with_retry(query, max_retries=3):
    """Handles rate limits with exponential backoff."""
    for attempt in range(max_retries):
        r = requests.post("https://api.scavio.dev/api/v1/search",
            headers=H,
            json={"platform": "google", "query": query},
            timeout=10
        )
        if r.status_code == 200:
            return r.json()
        if r.status_code == 429:
            wait = 2 ** attempt
            print(f"Rate limited. Waiting {wait}s...")
            time.sleep(wait)
            continue
        r.raise_for_status()
    raise Exception(f"Failed after {max_retries} retries")

Pattern 3: Context Window Overflow

Agent pipelines that accumulate search results across multiple steps overflow the context window. By step 5, the LLM is summarizing its own summaries instead of working with real data. Solution: extract and store only the fields you need, discard the rest.

Python
def compact_results(raw_response, max_items=5):
    """Extract only needed fields to save context space."""
    return [
        {
            "title": item.get("title", ""),
            "url": item.get("link", ""),
            "snippet": item.get("snippet", "")[:150],
        }
        for item in raw_response.get("organic", [])[:max_items]
    ]

# Full response might be 50KB, compact version is 2KB
raw = fresh_search("best crm software")
compact = compact_results(raw)
print(f"Compact results: {len(compact)} items")

Pattern 4: Hallucinated Tool Calls

After several turns, LLMs start hallucinating tool call parameters or skipping tool calls entirely, answering from memory instead. Validate every tool response against expected schema before passing it downstream.

Building for the Second Call

Design every pipeline step assuming the previous step returned garbage. Validate inputs, force fresh data, handle rate limits, and compact context aggressively. The demo always works. Production reliability comes from handling the failure modes that only appear on repeated use.