Cross-region inferencing

Cross-region inferencing increases throughput and manages traffic bursts by routing requests to AWS regions that have available capacity.

The GenAI application provides dedicated profiles for regional and global regions. These profiles use resources worldwide and bypass on-demand service quotas.

Model inference availability varies by region and depends on the underlying large language model (LLM) provider.

Because supported regions and inference profiles change frequently, this documentation does not include a complete region-by-model matrix.

For current and complete information about supported regions and models, see the Amazon Bedrock documentation: Supported Regions and models for inference profiles (Amazon Bedrock)