Skip to content

When a bin has samples of the same probability #19

@forrestbao

Description

@forrestbao

Hi,

I just came across an interesting corner case: some bins have samples of the same probability.

The code below will reproduce the error.

import calibration as cal

model_probs = [[0.5507, 0.4493], 
 [0.8764, 0.1236],
 [0.1822, 0.8178],
 [0.3814, 0.6186],
 [0.9725, 0.0275],
 [0.281,  0.719 ],
 [0.8817, 0.1183],
 [0.8193, 0.1807],
 [0.4806, 0.5194],
 [0.9415, 0.0585],
 [0.4648, 0.5352],
 [0.9561, 0.0439]]
labels = [0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0]

calibrator = cal.PlattBinnerMarginalCalibrator(len(labels), num_bins=4)
calibrator.train_calibration(model_probs, labels)
print (calibrator._bins)

The shape of the first row in calibrator._bins is (3,) instead of (4,) as expected.

We looked into the reason and found that the last two bins have samples of the same probabilities.
image

We are wondering whether in such a case, an error message should be thrown out or the probabilities should have been added with noises.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions