feat: Design of `EstimatorReport` #997

glemaitre · 2024-12-20T21:10:16Z

closes #834

Investigate an API for a EstimatorReport.

TODO

Notes

This PR build upon:

feat: Use friendly verbose and colorish #962 to reuse the skore.console
fix: Use estimator whenever possible to detect the ML task #998 to be able to detect clusterer in a consistent manner.

examples/model_evaluation/plot_estimator_report.py

skore/src/skore/sklearn/_estimator/base.py

skore/src/skore/sklearn/_estimator/report.py

skore/src/skore/sklearn/_estimator/metrics_accessor.py

skore/pyproject.toml

thomass-dev · 2025-01-09T10:48:07Z

skore/src/skore/sklearn/__init__.py

@@ -1,9 +1,11 @@
 """Enhance `sklearn` functions."""

+from skore.sklearn._estimator import EstimatorReport


It's disturbing that you want to expose something from a private/protected module.
Shouldn't skore.sklearn.estimator be exposed too by removing _?

Basically, I want the user to be able to do

skore.EstimatorReport

or

skore.sklean.EstimatorReport

but I don't want to expose in a lower level. In scikit-learn (and other package), whenever you don't want people to import from the private module, you add an _ even if it is a folder.

For instance, I would probably to the same for cross_validation.

However, it is something that we can discuss later.

skore/tests/conftest.py

thomass-dev · 2025-01-09T11:03:13Z

skore/tests/conftest.py

+    """Setup and teardown fixture for matplotlib.
+
+    This fixture checks if we can import matplotlib. If not, the tests will be
+    skipped. Otherwise, we close the figures before and after running the


Fmi, why closing before, not just after?

I don't have a definitive answer since I did not write in scikit-learn. What I can infer is that some test might fail and might not end in the teardown maybe. So the subsequent test is here to make a clean start. However, I'm unsure.

skore/src/skore/utils/_accessor.py

skore/tests/unit/sklearn/plot/__init__.py

thomass-dev · 2025-01-09T13:41:46Z

skore/src/skore/sklearn/_estimator/report.py

+            "estimator[/bold cyan]"
+        )
+
+    def _create_help_tree(self):


Can you please add to the helper the representation of the attributes of the reporter.
For instance, it can help users to know that the reporter contains the fitted estimator.

I added an ending branch listing all getter and init attributes.

rouk1 · 2025-01-09T14:34:17Z

skore/src/skore/sklearn/_estimator/report.py

+            )
+        )
+        # trigger the computation
+        list(


Maybe we could have a list of indeterminated progress instead of one progress bar that "jumps".

Happy to see what we can do to improve the current state.

skore/src/skore/sklearn/_estimator/metrics_accessor.py

skore/src/skore/sklearn/_estimator/metrics_accessor.pyi

thomass-dev · 2025-01-09T15:34:59Z

skore/src/skore/sklearn/_estimator/metrics_accessor.pyi

@@ -0,0 +1,168 @@
+from typing import Any, Callable, Literal, Optional, Union


To-do: check if removing the stub files breaks the auto-completion or not, and check if a work-around exists (ping @augustebaum).

Co-authored-by: Sylvain Combettes <[email protected]>

github-actions · 2025-01-09T17:38:30Z

Documentation preview @ 82f6332

github-actions · 2025-01-09T18:53:05Z

Coverage Report for backend

File	Stmts	Miss	Cover	Missing
venv/lib/python3.12/site-packages/skore
__init__.py	12	0	100%
__main__.py	8	1	80%	19
exceptions.py	3	0	100%
venv/lib/python3.12/site-packages/skore/cli
__init__.py	5	0	100%
cli.py	33	3	85%	104, 111, 117
color_format.py	43	3	90%	35–>40, 41–43
launch_dashboard.py	26	15	39%	36–57
quickstart_command.py	14	7	50%	37–51
venv/lib/python3.12/site-packages/skore/item
__init__.py	21	0	100%
cross_validation_item.py	137	10	93%	27–42, 370
item.py	41	13	68%	85, 88, 92–112
item_repository.py	42	2	93%	12–13
media_item.py	70	4	94%	15–18
numpy_array_item.py	25	1	93%	15
pandas_dataframe_item.py	34	1	95%	15
pandas_series_item.py	34	1	95%	15
polars_dataframe_item.py	32	1	94%	15
polars_series_item.py	27	1	94%	15
primitive_item.py	27	2	92%	13–15
sklearn_base_estimator_item.py	33	1	95%	15
skrub_table_report_item.py	10	1	86%	11
venv/lib/python3.12/site-packages/skore/persistence
__init__.py	0	0	100%
abstract_storage.py	22	1	95%	130
disk_cache_storage.py	33	1	95%	44
in_memory_storage.py	20	0	100%
venv/lib/python3.12/site-packages/skore/project
__init__.py	3	0	100%
create.py	52	8	88%	116–122, 132–133, 140–141
load.py	23	3	89%	43–45
open.py	14	0	100%
project.py	64	4	91%	135, 149, 183, 187
venv/lib/python3.12/site-packages/skore/sklearn
__init__.py	4	0	100%
find_ml_task.py	35	1	95%	41–>49, 50
types.py	2	0	100%
venv/lib/python3.12/site-packages/skore/sklearn/_estimator
__init__.py	10	0	100%
base.py	76	2	98%	87–88
metrics_accessor.py	198	2	98%	131, 266
report.py	165	1	97%	145–>151, 147–>149, 150, 153–>155, 159–>163, 408–>413
utils.py	11	11	0%	1–19
venv/lib/python3.12/site-packages/skore/sklearn/_plot
__init__.py	4	0	100%
precision_recall_curve.py	126	2	97%	200–>203, 313–314
prediction_error.py	75	0	99%	289–>297
roc_curve.py	95	3	94%	156, 167–>170, 223–224
utils.py	77	0	100%
venv/lib/python3.12/site-packages/skore/sklearn/cross_validation
__init__.py	2	0	100%
cross_validation_helpers.py	47	4	90%	104–>136, 123–126
cross_validation_reporter.py	35	1	95%	177
venv/lib/python3.12/site-packages/skore/sklearn/cross_validation/plots
__init__.py	0	0	100%
compare_scores_plot.py	29	1	92%	10, 45–>48
timing_plot.py	29	1	94%	10
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split
__init__.py	0	0	100%
train_test_split.py	34	2	94%	15–16
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split/warning
__init__.py	8	0	100%
high_class_imbalance_too_few_examples_warning.py	17	3	78%	16–18, 80
high_class_imbalance_warning.py	18	2	88%	16–18
random_state_unset_warning.py	11	1	87%	15
shuffle_true_warning.py	9	0	91%	44–>exit
stratify_is_set_warning.py	11	1	87%	15
time_based_column_warning.py	22	1	89%	17, 69–>exit
train_test_split_warning.py	5	1	80%	21
venv/lib/python3.12/site-packages/skore/ui
__init__.py	0	0	100%
app.py	25	5	71%	24, 53–58
dependencies.py	7	1	86%	12
project_routes.py	50	0	100%
venv/lib/python3.12/site-packages/skore/utils
__init__.py	0	0	100%
_accessor.py	7	0	100%
_logger.py	21	4	84%	14–18
_show_versions.py	31	0	100%
venv/lib/python3.12/site-packages/skore/view
__init__.py	0	0	100%
view.py	5	0	100%
view_repository.py	16	2	83%	8–9
TOTAL	2225	136	93%

Tests	Skipped	Failures	Errors	Time
349	0 💤	0 ❌	0 🔥	44.190s ⏱️

glemaitre · 2025-01-09T19:15:11Z

OK. It should be good to go and we should be able to iterate.

sylvaincom

Many thanks for this very useful PR @glemaitre and the whole team for reviewing it! Let's iterate on sub-issues if needed

closes probabl-ai#834 Investigate an API for a `EstimatorReport`. #### TODO - [x] Metrics - [x] handle string metrics has specified in the accessor - [x] handle callable metrics - [x] handle scikit-learn scorers - [x] use efficiently the cache as much as possible - [x] add testing for all of those features - [x] allow to pass new validation set to functions instead of using the internal validation set - [x] add a proper help and rich `__repr__` - [x] Plots - [x] add the roc curve display - [x] add the precision recall curve display - [x] add prediction error display for regressor - [x] make proper testing for those displays - [x] add a proper `__repr__` for those displays - [x] Documentation - [x] (done for the checked part) add an example to showcase all the different features - [x] find a way to show the accessors documentation in the page of `EstimatorReport`. It could be a bit tricky because they are only defined once the instance created. - We need to have a look at the `series.rst` page from pandas to see how they document this sort of pattern. - [x] check the autocompletion: when typing `report.metrics.->tab` it should provide the autocompetion. **edit**: having a stub file is actually working. I prefer this than type hints directly in the file. - Open questions - [x] we use hashing to retrieve external set. - use the caching for the external validation set? To make it work we need to compute the hash of potentially big arrays. This might more costly than making the model predict. #### Notes This PR build upon: - probabl-ai#962 to reuse the `skore.console` - probabl-ai#998 to be able to detect clusterer in a consistent manner.

glemaitre and others added 10 commits December 15, 2024 23:58

feat: Use friendly verbose and colorish

fd94974

limit size

1a3d4a6

tweak bold effect

fe075a4

iter

505f2b4

test: complete tests for new arg in cli

c563d45

iter

801795c

use context manager as a more explicit way to configurate the logger

376099c

TST add a couple of quick test for the logger context manager

d427f2d

Merge remote-tracking branch 'glemaitre/is/959' into model_report

ce273b6

feat: EstimatorReport

9cdeacd

glemaitre marked this pull request as draft December 20, 2024 21:10

glemaitre changed the title ~~feat: design of ModelReport~~ feat: Design of ModelReport Dec 20, 2024

glemaitre added 18 commits December 21, 2024 15:11

iter

56b1821

fix: Use estimator whenever possible to detect the ML task

acb51e0

iter

92b4e6c

tests

2982798

tests

66a3fd3

DOC add some docstring

4b7124b

EXA add an example to present the feature

7b7c9c8

iter

8de79ae

add metrics

cf0c865

allow to pass a new set of data to the metrics

b1cd767

iter

97577b4

TST add test for individual metrics

a9d67b4

TST add test for the default scoring in

597212b

TST add test for passing scoring kwargs in report_metrics

19fd6d8

TST add check that we properly hit the cache with arbitrary keywords

eeba764

FEA add support for an arbitrary metric

f845db6

improve example

9f219f2

check name add add test with joblib hash

eaeb072

add sphinx_autosummary_accessors as a dependence

5a44fc7

sylvaincom mentioned this pull request Jan 9, 2025

Enhance skore.cross_validate: specify which metric is actually used behind test_score #578

Closed

sylvaincom reviewed Jan 9, 2025

View reviewed changes

thomass-dev reviewed Jan 9, 2025

View reviewed changes

rouk1 reviewed Jan 9, 2025

View reviewed changes

skore/src/skore/sklearn/_estimator/metrics_accessor.py Outdated Show resolved Hide resolved

thomass-dev reviewed Jan 9, 2025

View reviewed changes

skore/src/skore/sklearn/_estimator/metrics_accessor.pyi Show resolved Hide resolved

thomass-dev reviewed Jan 9, 2025

View reviewed changes

glemaitre and others added 4 commits January 9, 2025 18:33

Update examples/model_evaluation/plot_estimator_report.py

ad4f408

Co-authored-by: Sylvain Combettes <[email protected]>

Update skore/src/skore/sklearn/_estimator/base.py

dad7211

Co-authored-by: Sylvain Combettes <[email protected]>

Update skore/src/skore/sklearn/_estimator/report.py

8cc9f4e

Co-authored-by: Sylvain Combettes <[email protected]>

rewrap

4b8be36

glemaitre added 5 commits January 9, 2025 19:01

add legend in the help

ad20a1c

make matplotlib and pandas a dependency

682801c

remove unecessary __init__.py

d519bd7

vendor the accessor

65a0f5d

iter

ebb7ab9

add attributes

cb5f210

glemaitre added 3 commits January 9, 2025 23:00

check that we support X_y without passing original dataset

b8d4610

compute brier score for both labels

866be82

simplify the brier score

82f6332

sylvaincom mentioned this pull request Jan 9, 2025

feat: Use HTML to present warnings of train_test_split when in notebooks #1060

Closed

sylvaincom self-requested a review January 9, 2025 23:37

sylvaincom approved these changes Jan 9, 2025

View reviewed changes

thomass-dev merged commit 1a4151a into probabl-ai:main Jan 10, 2025
18 checks passed

		@@ -1,9 +1,11 @@
		"""Enhance `sklearn` functions."""

		from skore.sklearn._estimator import EstimatorReport

		@@ -0,0 +1,168 @@
		from typing import Any, Callable, Literal, Optional, Union

feat: Design of EstimatorReport #997

feat: Design of EstimatorReport #997

Uh oh!

Conversation

glemaitre commented Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomass-dev Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

glemaitre Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

thomass-dev Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

glemaitre Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

thomass-dev Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

rouk1 Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

glemaitre Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

thomass-dev Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Jan 9, 2025

Uh oh!

sylvaincom left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

feat: Design of `EstimatorReport` #997

feat: Design of `EstimatorReport` #997

glemaitre commented Dec 20, 2024 •

edited

Loading

thomass-dev Jan 9, 2025 •

edited

Loading

thomass-dev Jan 9, 2025 •

edited

Loading

github-actions bot commented Jan 9, 2025 •

edited

Loading

github-actions bot commented Jan 9, 2025 •

edited

Loading