Evaluations

This section describes automated evaluations in GenAI

Evaluations are used to validate the expected output generated by a GenAI Agent or Tool when provided with a specific input. This process is especially useful for testing prompt behavior and confirming that user inputs consistently produce accurate and relevant responses. Evaluations may be repeated to observe how reliably a prompt returns the correct output, helping assess the overall performance of the GenAI Agent or Tool.

To manage evaluations within your tenant, use either the Evaluations UI in the GenAI application or the Evaluation API endpoints. These interfaces allow you to create, update, and review evaluations.