Skip to content

Conversation

@GaetandeCast
Copy link
Contributor

@GaetandeCast GaetandeCast commented Nov 27, 2025

Closes #2112.

With this PR, for binary classification, calling report.metrics.confusion_matrix(threshold=True) will compute and store the confusion matrices for all thresholds of the decision function of the classifier. They can then be plotted with .plot(threshold_value=x) and accessed via .frame().

This makes use of the new scikit-learn function confusion_matrix_at_thresholds, available in 1.8 and back-ported for earlier versions.

The storage structure extends what we converged towards in #2165 : a long format dataframe with one cell of one matrix per row. Columns are the raw count number, all three possible normalized values, the threshold value, and the true and predicted labels.

The default threshold value is 0.5. We could add an "auto" option to select the "best" threshold if we find a satisfactory universal metric to define what "best" means (balanced accuracy may not be desirable for instance, as argued here).

@GaetandeCast GaetandeCast force-pushed the decision_threshold_to_confusion_matrix branch from 969d097 to d0e5f1f Compare November 27, 2025 09:12
@github-actions
Copy link
Contributor

github-actions bot commented Nov 27, 2025

Coverage

Coverage Report for skore/
FileStmtsMissCoverMissing
skore/src/skore
   __init__.py240100% 
   _config.py310100% 
   exceptions.py440%4, 15, 19, 23
skore/src/skore/_sklearn
   __init__.py60100% 
   _base.py1991492%46, 59, 128, 131, 184, 187–188, 190–193, 226, 229–230
   find_ml_task.py610100% 
   types.py29196%30
skore/src/skore/_sklearn/_comparison
   __init__.py70100% 
   feature_importance_accessor.py35294%93, 117
   metrics_accessor.py179398%169, 249, 1212
   report.py1080100% 
   utils.py570100% 
skore/src/skore/_sklearn/_cross_validation
   __init__.py90100% 
   data_accessor.py45393%134, 137, 140
   feature_importance_accessor.py240100% 
   metrics_accessor.py183199%242
   report.py136199%490
skore/src/skore/_sklearn/_estimator
   __init__.py90100% 
   data_accessor.py66198%82
   feature_importance_accessor.py168298%258–259
   metrics_accessor.py386698%329, 398, 402, 417, 452, 2137
   report.py166298%449–450
skore/src/skore/_sklearn/_plot
   __init__.py30100% 
   base.py106694%61–62, 247–249, 253
   utils.py770100% 
skore/src/skore/_sklearn/_plot/data
   __init__.py20100% 
   table_report.py185199%706
skore/src/skore/_sklearn/_plot/metrics
   __init__.py60100% 
   confusion_matrix.py920100% 
   feature_importance_coefficients_display.py712170%116–119, 121, 140, 146–152, 155, 161–165, 170–171
   metrics_summary_display.py80100% 
   precision_recall_curve.py301698%242, 535, 635, 639, 699, 831
   prediction_error.py233597%179, 186, 423, 506, 706
   roc_curve.py314997%263, 455, 578, 583, 684, 689, 693, 762, 902
skore/src/skore/_sklearn/train_test_split
   __init__.py00100% 
   train_test_split.py580100% 
skore/src/skore/_sklearn/train_test_split/warning
   __init__.py80100% 
   high_class_imbalance_too_few_examples_warning.py19194%83
   high_class_imbalance_warning.py200100% 
   random_state_unset_warning.py100100% 
   shuffle_true_warning.py90100% 
   stratify_is_set_warning.py100100% 
   time_based_column_warning.py210100% 
   train_test_split_warning.py30100% 
skore/src/skore/_utils
   __init__.py6266%8, 13
   _accessor.py90396%34, 146, 190
   _cache.py230100% 
   _environment.py27196%40
   _fixes.py80100% 
   _index.py50100% 
   _logger.py22481%15–17, 19
   _measure_time.py100100% 
   _parallel.py38392%23, 33, 124
   _patch.py211242%30, 35–39, 42–43, 46–47, 58, 60
   _progress_bar.py460100% 
   _repr_html.py80100% 
   _show_versions.py380100% 
   _testing.py560100% 
skore/src/skore/project
   __init__.py20100% 
   project.py480100% 
   summary.py75198%120
   widget.py1870100% 
TOTAL419811597% 

Tests Skipped Failures Errors Time
1145 5 💤 0 ❌ 0 🔥 4m 1s ⏱️

@github-actions
Copy link
Contributor

github-actions bot commented Nov 27, 2025

Documentation preview @ 383cb64

@GaetandeCast GaetandeCast force-pushed the decision_threshold_to_confusion_matrix branch from 7007077 to c12cdac Compare November 28, 2025 16:35
@GaetandeCast GaetandeCast marked this pull request as ready for review December 1, 2025 15:23
@glemaitre glemaitre self-requested a review December 1, 2025 19:17
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a first pass on the API design. I'm overlooking the tests since that the changes will have an impact and we can iterate already on the API before to take care about the test.

@GaetandeCast
Copy link
Contributor Author

Hi @glemaitre, thanks for the review. I have have implemented the requested changes. I will wait until we are satisfied with the API to update the tests and the doc example. So until then, the CI will be red.

@GaetandeCast
Copy link
Contributor Author

GaetandeCast commented Dec 3, 2025

@glemaitre I did changes suggested orally:

  • add is an asterisk on the positive label and a legend explaining it
  • the decision threshold is now on a new line in the title
  • mention "Decision threshold: 0.5" in the title when no threshold was explicitly chosen
  • describe the usage and utility of the threshold in the docstring of plot() and frame().

Here is how the display.plot() looks like now:
image

Generated with:

cm_display = report.metrics.confusion_matrix(pos_label="disallowed")
cm_display.plot(threshold_value=0.3)
plt.show()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

enh(skore): Add a decision_threshold parameter to the confusion_matrix display

2 participants