The Cache-Search-Results Pattern for AI Agents
60-80% cache hit rate cuts API spend the same. SQLite is enough for most agents. Pattern from r/crewai.
An r/crewai post documented an SDR agent migration that used a SQLite cache to return search results in 50ms. The cache pattern is the single biggest cost-and-latency lever for any production agent. Most teams skip it because the upfront work feels unimportant, then watch their search bill grow as the agent scales.
Why agents hit the cache hard
AI agents repeat queries far more than humans do. A daily research agent that runs the same 30-keyword grid every morning has a 100% theoretical cache hit rate after day one — every query has been seen before. In practice the rate lands at 60-80% because most agents have some user-specific or timestamp-specific queries mixed in.
The simple cache shape
Key: query plus surface (search vs reddit/search vs youtube/search) plus any modifiers (search_type, country). Value: the typed JSON response. TTL: 1 hour for SERP, 30 minutes for breaking-news queries, 6+ hours for static reference pages, 24 hours for slow-changing content like government PDFs.
import sqlite3, json, time, requests, os
API_KEY = os.environ['SCAVIO_API_KEY']
H = {'x-api-key': API_KEY}
conn = sqlite3.connect('cache.db')
conn.execute('CREATE TABLE IF NOT EXISTS cache(key TEXT PRIMARY KEY, payload TEXT, ts REAL)')
def get(key, ttl=3600):
row = conn.execute('SELECT payload, ts FROM cache WHERE key=?', (key,)).fetchone()
if row and time.time() - row[1] < ttl:
return json.loads(row[0])
return None
def set_(key, payload):
conn.execute('INSERT OR REPLACE INTO cache VALUES (?, ?, ?)',
(key, json.dumps(payload), time.time()))
conn.commit()
def search_cached(q, ttl=3600):
key = f'search::{q}'
cached = get(key, ttl)
if cached: return cached
data = requests.post('https://api.scavio.dev/api/v1/search',
headers=H, json={'query': q}).json()
set_(key, data)
return dataSQLite is enough for most agents
The r/crewai post used SQLite specifically. It returns hits in single-digit milliseconds, persists across restarts, requires zero infrastructure. For agents serving 1-100 concurrent users SQLite is fine. Teams hitting 1000+ concurrent users move to Redis or DuckDB for write throughput.
What cache hit rate actually looks like in production
Day 1 of agent operation: 0% hit rate (every query is fresh). Day 5: typically 40-60% hit rate. Day 30: 60-80% steady-state for a research agent, lower for a discovery agent that fans out to user-specific queries. Track the rate weekly to know whether the cache is paying off.
The latency story
Live Scavio call: 800-1200ms for SERP, 1500-2500ms for extract. SQLite cache hit: 5-15ms. On a 30-turn agent session that hits cache 70% of the time, total search latency drops from ~25 seconds of cumulative wait to ~8 seconds. The user feels the agent get faster.
The cost story
Same agent, same volume, 70% cache hit rate cuts API spend by 70%. A $30/mo Scavio Project tier feels like a $90/mo tier of coverage at the same price point. As the agent scales, the cache prevents the bill from scaling linearly.
When not to cache
Anything time-sensitive enough that a 1-hour-old answer is wrong. Stock prices, sports scores, breaking news bursts. For those, set TTL to 30-60 seconds or skip the cache layer entirely on those specific endpoints.
The eviction question
SQLite grows unbounded if you never evict. For most agents that's fine — the database stays under 1 GB even after a year of operation. If the size matters, run a weekly job that deletes rows older than the longest TTL.
What this composes with
The cache layer pairs naturally with the structured search pattern. Cached typed JSON is cheap to deserialize, fits into LLM context with predictable token cost, and works with any agent framework. CrewAI, LangChain, opencode, Claude Code — the cache layer sits between the agent and the API regardless of the orchestrator.