Skip to content

f1-score evaluation #9

@p16i

Description

@p16i

Hi,

While working on #8, it seems to me that the evaluation of f-score is based on flatten true and pred labels. For example, given 2 samples whose lengths are 7 and 20. The current code flatten the labels to shape (27,) and compute the score. However, I think it could overestimate the value.

To illustrate, I've made a notebook using random data. You can see in there that the avg f-score is slightly lower than the f-score from the flatten data.

Looking forward to your thought on this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions