Cache Search Results: Cuts Cost 60% on Repeat Queries
AI agents repeat queries. SQLite + 50ms hits cut Scavio cost 60-80% after one week of operation.
AI agents repeat queries more than humans do. A research crew running the same 30 keywords every morning hits 60-80% cache rate after one week of operation. The pattern is cheap to implement and the savings compound. Most agent builds skip caching because it feels like infrastructure and end up paying multiples of necessary cost.
Why agents repeat queries
- Daily cron jobs over the same keyword grid.
- Multi-agent crews that re-query the same context.
- Retry logic that re-runs failed tool calls.
- User-facing chat agents whose users ask similar things.
The simplest cache
SQLite. 50ms reads on hit. The cache key is query plus surface plus modifiers. The cache value is the typed JSON response. TTL varies by surface: 1 hour for SERP (results change), 30 min for Reddit (threads churn), 6-24 hours for Amazon product data (relatively stable).
import sqlite3, time, hashlib, json
import os, requests
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
conn = sqlite3.connect('search.db')
conn.execute('CREATE TABLE IF NOT EXISTS cache(k TEXT PRIMARY KEY, payload TEXT, ts REAL)')
def key(query, surface='search', **modifiers):
payload = json.dumps({'q': query, 's': surface, **modifiers}, sort_keys=True)
return hashlib.sha1(payload.encode()).hexdigest()
def search(query, ttl=3600):
k = key(query, 'search')
row = conn.execute('SELECT payload, ts FROM cache WHERE k=?', (k,)).fetchone()
if row and time.time() - row[1] < ttl:
return json.loads(row[0])
data = requests.post('https://api.scavio.dev/api/v1/search',
headers=H, json={'query': query}).json()
conn.execute('INSERT OR REPLACE INTO cache VALUES (?, ?, ?)',
(k, json.dumps(data), time.time()))
conn.commit()
return dataThe TTL table
TTL = {
'search': 3600, # 1 hour for SERP
'reddit_search': 1800, # 30 min for Reddit threads
'youtube_search': 7200, # 2 hours for YouTube
'amazon_search': 21600, # 6 hours for Amazon listings
'extract': 86400, # 24 hours for page content
}The cost math
Pre-cache: 100 calls/day × $0.0043 = $0.43/day = ~$13/mo of Scavio. Post-cache at 67% hit rate: 33 actual calls/day × $0.0043 = $0.14/day = ~$4.30/mo. Same workload, 67% cost reduction.
The latency win
Live Scavio call: typically 300-700ms. Cache hit: 30-50ms on SQLite. For agents that make multiple calls per session, this compounds. A 10-call session that was averaging 4-5 seconds drops to 1-2 seconds with a warm cache.
What kills cache effectiveness
High-cardinality query strings. Each unique query is a unique cache key, so an agent that paraphrases the same intent slightly differently ("best mcp", "best MCP server", "top mcp options") hits the cache roughly never. Fix: normalize queries before keying (lowercase, strip punctuation, collapse whitespace). Or use semantic keying with embeddings, but that adds infrastructure that usually isn't worth the gain.
Distributed cache for multi-agent crews
SQLite is single-process. For multi-agent crews running in parallel, swap to Redis or DuckDB-based shared cache. The access pattern is the same; the TTL logic is the same; only the storage layer changes.
The honest tradeoff
Cached results are by definition stale. For breaking news, financial data, or truly time-sensitive queries, cache TTL has to be near-zero, which kills the savings. For research, prospecting, content automation, and most agent workloads, TTLs of 30 min to 24 hours are fine and the cache rate climbs into the 60-80% range.
What ships in week one
20-line wrapper around your existing Scavio call. SQLite file. TTL config. Re-run your existing agent against the wrapped client. Check cost a week later; it will be 50-70% lower. The infrastructure investment is one afternoon; the savings recur monthly.