Skip to content

Implement metric for valid/missing answer rate #984

@jonaserb-k

Description

@jonaserb-k

Problem explanation

In eva when using MulticlassClassificationMetrics there is no way to know how many of the responses of the model were accurate, i.e. if answer format is json then parseable as a json and thus a valid response. In the worst case there could be only a few valid answers and by chance thus a very high accuracy score which we would not be able to tell apart.

A metric to calculate the valid answer rate should be implemented.

Discussed implementation idea

Implement a metric similar to eva_internal ValidAnswerRate that captures the number of missing values (e.g. -1) to compare them against the total amount of responses.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions