-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Description
Problem explanation
In eva when using MulticlassClassificationMetrics there is no way to know how many of the responses of the model were accurate, i.e. if answer format is json then parseable as a json and thus a valid response. In the worst case there could be only a few valid answers and by chance thus a very high accuracy score which we would not be able to tell apart.
A metric to calculate the valid answer rate should be implemented.
Discussed implementation idea
Implement a metric similar to eva_internal ValidAnswerRate that captures the number of missing values (e.g. -1) to compare them against the total amount of responses.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels