Prompt cache for messages

This topic describes how prompt cache works when you use the /messages or /messages/stream API endpoints.

Overview

The /messages and /messages/stream endpoints support prompt cache through AWS Bedrock. If a prompt payload matches a previously submitted request, the system returns a cached response instead of generating a new one. This reduces latency and token usage.

When to Use

Use prompt cache in these situations:

  • You send prompt templates with the same request payload.
  • You use system-level prompts.
  • You respond to frequently asked questions.
Note: Cached responses have a time to live (TTL) of five minutes. The TTL resets each time the system returns a cached response. If the cache is not used within five minutes, it expires automatically.

Supported models

These are the models that support prompt cache:

  • Claude Sonnet 4
  • Claude 3.7 Sonnet
  • Claude 3.5 Sonnet v2
  • Claude 3.5 Haiku
  • Amazon Nova Pro
  • Amazon Nova Micro
  • Amazon Nova Lite