Skip to content

Cannot plot PR curve with labels of type np.str_ #2183

@auguste-probabl

Description

@auguste-probabl
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from skore import train_test_split, ComparisonReport, CrossVal
idationReport
iris = load_iris(as_frame=True)
X, y = iris.data, iris.target
y = iris.target_names[y]
split_data = train_test_split(
    X=X, y=y, random_state=0, as_dict=True, shuffle=True
)
report_1 = CrossValidationReport(LogisticRegression(), X, y)
report_2 = CrossValidationReport(LogisticRegression(max_iter=500), X, y)
report = ComparisonReport([report_1, report_2])
display = report.metrics.precision_recall()
display.plot()

raises a strange-sounding error: UndefinedVariableError: name 'np' is not defined

The offending line is

query = f"label == {label!r} & estimator_name == '{estimator_name}'"

The issue is that labels are np.str_ instances, so the !r in the f-string results in "label == np.str_('setosa') & ..." but it should be "label == 'setosa' & ...". We use !r to ensure that normal Python strings are printed with surrounding quotes:

f"label == {label}" -> "label == setosa"
while
f"label == {label!r}" -> "label == 'setosa'".

We need the surrounding quotes for Pandas queries to work.

Environment

Commit b2ed6df

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions