Displaying evaluation results
- Navigate to Infor OS > GenAI > Factory > Evaluation.
- Click View Details… button next to the relevant evaluation.
- Select a specific Evaluation job from the list.
-
Review the evaluation results, which include:
- Executed by: the user who ran the evaluation job
- Executed on: when the
evaluation job was ran
Type: the endpoint that the evaluation was tested against
Status: the current progress of the evaluation job
Steps: The answer scoring for correctness,similarity, and relevance.
The Steps section lists each evaluation parameter as a separate test step. Each step represents one test scenario.
Each step displays these values:
- Input: the question or prompt that the user sent to the GenAI Assistant during the test.
- Scoring indicators: four color-coded circular score indicators that show
how the GenAI Assistant performed for each scoring dimension. Green
indicates a passing score. Red indicates a failing score. Each indicator
is scored on a scale from 1 to 5. The four scoring dimensions are:
- Answer correctness
- Answer similarity
- Answer relevance
- Agent trajectory
When you select a step, the Steps result details panel opens on the right. The panel provides a detailed breakdown of that test scenario:
- Average Score: the combined average of all scoring dimensions for the step, displayed as a value from 1 to 5. The value gives a quick quality indicator.
- Results: the individual scores for each selected scoring dimension.
- Input: the exact question or prompt that the user submitted to the GenAI Assistant for the test scenario.
- Ground Truth: the expected answer that is defined for the evaluation parameter.
- Model Generated Output: the response that the GenAI Assistant produced during the evaluation run.
- Agent trajectory trace: a section with a View evaluation trace link that shows the ground truth and the actual trajectory.
- Answer correctness: an explanation from the Judge Model that describes whether the response was factually accurate and whether the response fully addressed the input question.
- Answer similarity: an explanation from the Judge Model that describes how closely the structure, content, and wording of the response matched the ground truth.
- Answer relevance: an assessment from the Judge Model that describes whether the response directly and comprehensively addressed the input question.
- Agent trajectory: the most technically detailed scoring dimension in the evaluation.