Chat rate limiting
The GenAI Assistant uses a four-tier rate-limiting model to ensure fair and predictable
resource allocation.
The system applies limits at these levels (RPM = requests per minute):
- User: Limits requests from a single user.
- Tenant: Limits requests from all users in a tenant.
- Customer: Limits requests from all tenants that belong to a customer.
- Environment: Limits requests across the entire environment.
When a request exceeds a rate limit, the system blocks the input for one minute and displays a message that identifies the tier that triggered the limit.
The system applies these behaviors to rate-limited requests:
- The system does not save rate-limited interactions to session history.
- Rate-limited turns do not appear after a conversation refresh.
- If an RPM value equals 0 or the configuration omits the value, the system does not enforce a rate limit for that tier.
Note: The system stores RPM values in the Farm Database and caches the
values. Changes take effect after approximately 15 minutes.