Description
Here I have in mind those metrics that compare a clustering model with independent ground truth (as opposed to "internal" measures of quality, such as the Calinski-Harabasz index). The following look like good candidates:
- Rand index
- Hubert & Arabie Adjusted Rand index
- Mirkin's index
- Hubert's index
- variation of information
- V-measure
- mutual information
The Clustering.jl package already has implementations, which assumes the clusters are labelled with integers. The first four are combined into one function, which returns a tuple instead of a single measurement, which deviates from the StatisticalMeasures.jl idiom. These could either be separate measures, or we could add a field for the desired variation.
Given that the definition of these measures are pretty simple, I think it's more trouble than it's worth to write and maintain interfaces for the existing code, which also requires making Clustering.jl a (conditional) dependency. I therefore propose new implementations here. The vanilla Rand index would make a great start.
Here's what traits would look like for these measures:
consumes_multiple_observations = true
kind_of_proxy = LearnAPI.LabelAmbiguous()
observation_scitype = Union{Missing, ScientificTypesBase.Finite}
orientation = StatisticalMeasuresBase.Score() # all except variation of information
orientation = StatisticalMeasuresBase.Loss() # variation of information
human_name = ... <string>
For others not mentioned above, the fallbacks suffice.