-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Description
Expend on agent evaluation with a notebook showing how to evaluate an agent response using LLM as Judge.
Once we have the foundation of what is LLMaJ and how it works, we should build from this and have a notebook on evaluating agent response with an Ensemble of Judges for more accurate judging.
Potential points to cover
- What is LLM as Judge?
- Why use it for agent response evaluation instead of other metrics?
- Pros and cons of ensemble of judges
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels