cacheperformancescavio

Cache Search Results: Cuts Cost 60% on Repeat Queries

AI agents repeat queries. SQLite + 50ms hits cut Scavio cost 60-80% after one week of operation.

April 29, 2026

5 min read

AI agents repeat queries more than humans do. A research crew running the same 30 keywords every morning hits 60-80% cache rate after one week of operation. The pattern is cheap to implement and the savings compound. Most agent builds skip caching because it feels like infrastructure and end up paying multiples of necessary cost.

Why agents repeat queries

Daily cron jobs over the same keyword grid.
Multi-agent crews that re-query the same context.
Retry logic that re-runs failed tool calls.
User-facing chat agents whose users ask similar things.

The simplest cache

SQLite. 50ms reads on hit. The cache key is query plus surface plus modifiers. The cache value is the typed JSON response. TTL varies by surface: 1 hour for SERP (results change), 30 min for Reddit (threads churn), 6-24 hours for Amazon product data (relatively stable).

Python

import sqlite3, time, hashlib, json
import os, requests

H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
conn = sqlite3.connect('search.db')
conn.execute('CREATE TABLE IF NOT EXISTS cache(k TEXT PRIMARY KEY, payload TEXT, ts REAL)')

def key(query, surface='search', **modifiers):
    payload = json.dumps({'q': query, 's': surface, **modifiers}, sort_keys=True)
    return hashlib.sha1(payload.encode()).hexdigest()

def search(query, ttl=3600):
    k = key(query, 'search')
    row = conn.execute('SELECT payload, ts FROM cache WHERE k=?', (k,)).fetchone()
    if row and time.time() - row[1] < ttl:
        return json.loads(row[0])
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=H, json={'query': query}).json()
    conn.execute('INSERT OR REPLACE INTO cache VALUES (?, ?, ?)',
        (k, json.dumps(data), time.time()))
    conn.commit()
    return data

The TTL table

Python

TTL = {
    'search': 3600,           # 1 hour for SERP
    'reddit_search': 1800,    # 30 min for Reddit threads
    'youtube_search': 7200,   # 2 hours for YouTube
    'amazon_search': 21600,   # 6 hours for Amazon listings
    'extract': 86400,         # 24 hours for page content
}

The cost math

Pre-cache: 100 calls/day × $0.0043 = $0.43/day = ~$13/mo of Scavio. Post-cache at 67% hit rate: 33 actual calls/day × $0.0043 = $0.14/day = ~$4.30/mo. Same workload, 67% cost reduction.

The latency win

Live Scavio call: typically 300-700ms. Cache hit: 30-50ms on SQLite. For agents that make multiple calls per session, this compounds. A 10-call session that was averaging 4-5 seconds drops to 1-2 seconds with a warm cache.

What kills cache effectiveness

High-cardinality query strings. Each unique query is a unique cache key, so an agent that paraphrases the same intent slightly differently ("best mcp", "best MCP server", "top mcp options") hits the cache roughly never. Fix: normalize queries before keying (lowercase, strip punctuation, collapse whitespace). Or use semantic keying with embeddings, but that adds infrastructure that usually isn't worth the gain.

Distributed cache for multi-agent crews

SQLite is single-process. For multi-agent crews running in parallel, swap to Redis or DuckDB-based shared cache. The access pattern is the same; the TTL logic is the same; only the storage layer changes.

The honest tradeoff

Cached results are by definition stale. For breaking news, financial data, or truly time-sensitive queries, cache TTL has to be near-zero, which kills the savings. For research, prospecting, content automation, and most agent workloads, TTLs of 30 min to 24 hours are fine and the cache rate climbs into the 60-80% range.

What ships in week one

20-line wrapper around your existing Scavio call. SQLite file. TTL config. Re-run your existing agent against the wrapped client. Check cost a week later; it will be 50-70% lower. The infrastructure investment is one afternoon; the savings recur monthly.