Chat rate limiting

The GenAI Assistant uses a four-tier rate-limiting model to ensure fair and predictable resource allocation.

The system applies limits at these levels (RPM = requests per minute):

  • User: Limits requests from a single user.
  • Tenant: Limits requests from all users in a tenant.
  • Customer: Limits requests from all tenants that belong to a customer.
  • Environment: Limits requests across the entire environment.

When a request exceeds a rate limit, the system blocks the input for one minute and displays a message that identifies the tier that triggered the limit.

The system applies these behaviors to rate-limited requests:

  • The system does not save rate-limited interactions to session history.
  • Rate-limited turns do not appear after a conversation refresh.
  • If an RPM value equals 0 or the configuration omits the value, the system does not enforce a rate limit for that tier.
Note: The system stores RPM values in the Farm Database and caches the values. Changes take effect after approximately 15 minutes.