cachingagentsperformance

The Cache-Search-Results Pattern for AI Agents

60-80% cache hit rate cuts API spend the same. SQLite is enough for most agents. Pattern from r/crewai.

April 28, 2026

5 min read

An r/crewai post documented an SDR agent migration that used a SQLite cache to return search results in 50ms. The cache pattern is the single biggest cost-and-latency lever for any production agent. Most teams skip it because the upfront work feels unimportant, then watch their search bill grow as the agent scales.

Why agents hit the cache hard

AI agents repeat queries far more than humans do. A daily research agent that runs the same 30-keyword grid every morning has a 100% theoretical cache hit rate after day one — every query has been seen before. In practice the rate lands at 60-80% because most agents have some user-specific or timestamp-specific queries mixed in.

The simple cache shape

Key: query plus surface (search vs reddit/search vs youtube/search) plus any modifiers (search_type, country). Value: the typed JSON response. TTL: 1 hour for SERP, 30 minutes for breaking-news queries, 6+ hours for static reference pages, 24 hours for slow-changing content like government PDFs.

Python

import sqlite3, json, time, requests, os
API_KEY = os.environ['SCAVIO_API_KEY']
H = {'x-api-key': API_KEY}

conn = sqlite3.connect('cache.db')
conn.execute('CREATE TABLE IF NOT EXISTS cache(key TEXT PRIMARY KEY, payload TEXT, ts REAL)')

def get(key, ttl=3600):
    row = conn.execute('SELECT payload, ts FROM cache WHERE key=?', (key,)).fetchone()
    if row and time.time() - row[1] < ttl:
        return json.loads(row[0])
    return None

def set_(key, payload):
    conn.execute('INSERT OR REPLACE INTO cache VALUES (?, ?, ?)',
        (key, json.dumps(payload), time.time()))
    conn.commit()

def search_cached(q, ttl=3600):
    key = f'search::{q}'
    cached = get(key, ttl)
    if cached: return cached
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=H, json={'query': q}).json()
    set_(key, data)
    return data

SQLite is enough for most agents

The r/crewai post used SQLite specifically. It returns hits in single-digit milliseconds, persists across restarts, requires zero infrastructure. For agents serving 1-100 concurrent users SQLite is fine. Teams hitting 1000+ concurrent users move to Redis or DuckDB for write throughput.

What cache hit rate actually looks like in production

Day 1 of agent operation: 0% hit rate (every query is fresh). Day 5: typically 40-60% hit rate. Day 30: 60-80% steady-state for a research agent, lower for a discovery agent that fans out to user-specific queries. Track the rate weekly to know whether the cache is paying off.

The latency story

Live Scavio call: 800-1200ms for SERP, 1500-2500ms for extract. SQLite cache hit: 5-15ms. On a 30-turn agent session that hits cache 70% of the time, total search latency drops from ~25 seconds of cumulative wait to ~8 seconds. The user feels the agent get faster.

The cost story

Same agent, same volume, 70% cache hit rate cuts API spend by 70%. A $30/mo Scavio Project tier feels like a $90/mo tier of coverage at the same price point. As the agent scales, the cache prevents the bill from scaling linearly.

When not to cache

Anything time-sensitive enough that a 1-hour-old answer is wrong. Stock prices, sports scores, breaking news bursts. For those, set TTL to 30-60 seconds or skip the cache layer entirely on those specific endpoints.

The eviction question

SQLite grows unbounded if you never evict. For most agents that's fine — the database stays under 1 GB even after a year of operation. If the size matters, run a weekly job that deletes rows older than the longest TTL.

What this composes with

The cache layer pairs naturally with the structured search pattern. Cached typed JSON is cheap to deserialize, fits into LLM context with predictable token cost, and works with any agent framework. CrewAI, LangChain, opencode, Claude Code — the cache layer sits between the agent and the API regardless of the orchestrator.