Prompt cache for messages

This topic describes how prompt cache works when you use the v1/messages, v2/messages or v1/messages/stream and v2/messages/stream API endpoints.

Overview

Prompt caching is a GenAI feature that improves performance by storing static or repetitive sections of input. When a prompt includes previously cached content, GenAI skips reprocessing that content. This reduces token usage, lowers operational costs, and improves response times and system efficiency.

The v1/messages, v2/messages or v1/messages/stream andr v2/messages/stream endpoints support prompt cache through AWS Bedrock. If a prompt payload matches a previously submitted request, the system returns a cached response instead of generating a new one. This reduces latency and token usage.

When to Use

Use prompt cache in these situations:

You send prompt templates with the same request payload.
You use system-level prompts.
You respond to frequently asked questions.

Note: Cached responses have a time to live (TTL) of five minutes. The TTL resets each time the system returns a cached response. If the cache is not used within five minutes, it expires automatically.

Supported models

These are the models that support prompt cache:

Claude Sonnet 4
Claude Sonnet 4.5
Claude Sonnet 4.6
Claude 3.5 Haiku
Claude 4.5 Haiku
Amazon Nova Pro
Amazon Nova Micro
Amazon Nova Lite