Definition
Prompt caching is an LLM provider feature that reuses already-processed prompt prefixes across requests, cutting latency and cost for workloads that repeat the same system prompt or tool schema many times.
In Depth
Anthropic, OpenAI, and Google all support some variant of prompt caching as of 2026. The typical caching window is five minutes of inactivity before cache eviction, which makes workload design matter: agents that sleep for six minutes and wake up pay the cache miss. Agent loops that call Scavio inside a loop benefit from prompt caching because the system prompt and tool schema stay identical across calls, only the search result changes.
Example Usage
The agent loop kept the system prompt stable across iterations so prompt caching kicked in and cut per-call latency nearly in half.
Platforms
Prompt Caching is relevant across the following platforms, all accessible through Scavio's unified API:
- YouTube
Related Terms
Agent Harness
An agent harness is the runtime and orchestration layer around an LLM that decides when to call tools, how to manage mem...
Tool Gateway
A tool gateway is a shared service that sits in front of an agent's external tools to centralize authentication, rate li...
Sub-Agent
A sub-agent is a specialized agent spawned by a parent agent to handle a scoped task, allowing a larger workflow to be d...