Cross-region inferencing

Cross-region inferencing increases throughput and manages traffic bursts by routing requests to AWS regions that have available capacity.

The GenAI application provides dedicated profiles for regional and global regions. These profiles use resources worldwide and bypass on-demand service quotas.

Review the region profiles for request re-routing:

  • us-east-1 to us-west-2
  • apac to ap-northeast-1
  • apac to ap-southeast-2