Prompt cache for messages

This topic describes how prompt cache works when you use the /messages or /messages/stream API endpoints.

Overview

The /messages and /messages/stream endpoints support prompt cache through AWS Bedrock. If a prompt payload matches a previously submitted request, the system returns a cached response instead of generating a new one. This reduces latency and token usage.

When to Use

Use prompt cache in these situations:

You send prompt templates with the same request payload.
You use system-level prompts.
You respond to frequently asked questions.

Note: Cached responses have a time to live (TTL) of five minutes. The TTL resets each time the system returns a cached response. If the cache is not used within five minutes, it expires automatically.

Supported models

These are the models that support prompt cache:

Claude Sonnet 4
Claude 3.7 Sonnet
Claude 3.5 Sonnet v2
Claude 3.5 Haiku
Amazon Nova Pro
Amazon Nova Micro
Amazon Nova Lite