Prompt cache for messages
This topic describes how prompt cache works when you use the
v1/messages, v2/messages or
v1/messages/stream and v2/messages/stream API
endpoints.Overview
Prompt caching is a GenAI feature that improves performance by storing static or repetitive sections of input. When a prompt includes previously cached content, GenAI skips reprocessing that content. This reduces token usage, lowers operational costs, and improves response times and system efficiency.
The v1/messages, v2/messages or
v1/messages/stream andr v2/messages/stream endpoints
support prompt cache through AWS Bedrock. If a prompt payload matches a previously submitted
request, the system returns a cached response instead of generating a new one. This reduces
latency and token usage.
When to Use
Use prompt cache in these situations:
- You send prompt templates with the same request payload.
- You use system-level prompts.
- You respond to frequently asked questions.
Note: Cached responses have a time to live (TTL) of five minutes. The
TTL resets each time the system returns a cached response. If the cache is not used within
five minutes, it expires automatically.
Supported models
These are the models that support prompt cache:
- Claude Sonnet 4
- Claude Sonnet 4.5
- Claude Sonnet 4.6
- Claude 3.5 Haiku
- Claude 4.5 Haiku
- Amazon Nova Pro
- Amazon Nova Micro
- Amazon Nova Lite