-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
Hi,
While working on #8, it seems to me that the evaluation of f-score is based on flatten true and pred labels. For example, given 2 samples whose lengths are 7 and 20. The current code flatten the labels to shape (27,) and compute the score. However, I think it could overestimate the value.
To illustrate, I've made a notebook using random data. You can see in there that the avg f-score is slightly lower than the f-score from the flatten data.
Looking forward to your thought on this.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels