Open
Description
This issue documents what's still needs to be done:
- Change the main LLM documentation . Make the new way the recommended away. Keep the old way only at the end highlighting this is the way to create a judge when you need full control of the input to the model.
- Add all new examples to the examples documentation page. Remove most legacy examples, except one or two .
- Finalize the "main" score of the the new llm as judge metrics.
For direct: "llm_as_judge_score" - The mean score of all the instance judge scores.
For pairwise: "1_win_rate" - The mean win_rate of the first system
Metadata
Assignees
Labels
No labels