Skip to content

Data with noise class #265

@gusnunes

Description

@gusnunes

Using "RandomRBFGeneratorEvents" to clustering the data I realized that when the stream has noise in it, the calculation of Purity, for example, is wrong. It happens because in MembershipMatrix, the "classmap" doens't contain the key "-1" that maps the noise label to the last "workcluster" index, instead of that, the noise label key is the number of clusters and it could be mapped to any "workcluster".
The line 52 of F1 measure is useless because "mm.hasNoiseClass()" always return false and the number of classes will be the same.

For example, a cluster has 2 instances of a real class and 5 noise instances
The current implementation would calculate that group purity is the value (5/7), because the noise index it's not ignored in "mm.getClusterClassWeight()" during the "for loop". Furthermore this also happens when the group contains only noise instances, wich is completely equivocaded.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions