Skip to content

enh(skore): Implement option data_source="both" in ROC curve Display for the ComparisonReport #2155

@MarieSacksick

Description

@MarieSacksick

What would you like to say?

Part of #1874.

As a reminder, today:

# %%
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from skore import train_test_split
from skore import EstimatorReport, ComparisonReport
X, y = load_breast_cancer(return_X_y=True)
split_data = train_test_split(X=X, y=y, random_state=0, as_dict=True)
classifier = LogisticRegression(max_iter=10_000)
report_1 = EstimatorReport(classifier, **split_data)
report_2 = EstimatorReport(RandomForestClassifier(), **split_data)
comp = ComparisonReport([report_1, report_2])
# %%
display = comp.metrics.precision_recall()
display.frame()

outputs:

Image

and:

Image

Frame

For the frame, let's add an extra column data_source, and concatenate all the data in a long and unique dataframe.

Plot

For the plot, to avoid having too many lines on the same plot (here in the example only two models are compared, but we could easily have more), and to avoid comparing the train curve of a model with the test curve of another, which wouldn't make any sense, let's use the subploting option once it's implemented in #1445 to have one plot with all the train curves, and one plot with all the test curves.
Why not having one subplot per model with train and test at the same place? It's not necessarily a bad idea, and we could think about adding an option to offer both possibilities. Yet, to start simple and iterate, it seems that having one subplot per model is more useful when one wants to investigate one given model, and therefore it's the EstimatorReport job, rather than the ComparisonReport.

Metadata

Metadata

Labels

feature 🎁Feature or enhancementready for dev 💻Issue specified enough and ready to be implemented

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions