Prompt cache for messages
This topic describes how prompt cache works when you use the
/messages
or /messages/stream
API endpoints.
Overview
The /messages
and /messages/stream
endpoints support prompt cache through AWS Bedrock. If a prompt payload matches a previously submitted request, the system returns a cached response instead of generating a new one. This reduces latency and token usage.
When to Use
Use prompt cache in these situations:
- You send prompt templates with the same request payload.
- You use system-level prompts.
- You respond to frequently asked questions.
Note: Cached responses have a time to live (TTL) of five minutes. The TTL resets each time the system returns a cached response. If the cache is not used within five minutes, it expires automatically.
Supported models
These are the models that support prompt cache:
- Claude Sonnet 4
- Claude 3.7 Sonnet
- Claude 3.5 Sonnet v2
- Claude 3.5 Haiku
- Amazon Nova Pro
- Amazon Nova Micro
- Amazon Nova Lite