Skip to content

Conversation

@MarieSacksick
Copy link
Contributor

@MarieSacksick MarieSacksick commented Aug 4, 2025

Closes #1964

On the way, I split the tests of the data accessor for estimator report between things specifics and common.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2025

Coverage

Coverage Report for skore/
FileStmtsMissCoverMissing
skore/src/skore
   __init__.py230100% 
   _config.py310100% 
   exceptions.py440%4, 15, 19, 23
skore/src/skore/_sklearn
   __init__.py60100% 
   _base.py1981492%45, 58, 127, 130, 183, 186–187, 189–192, 225, 228–229
   find_ml_task.py610100% 
   types.py27196%28
skore/src/skore/_sklearn/_comparison
   __init__.py70100% 
   feature_importance_accessor.py39294%92, 111
   metrics_accessor.py178398%173, 253, 1215
   report.py1060100% 
   utils.py540100% 
skore/src/skore/_sklearn/_cross_validation
   __init__.py90100% 
   data_accessor.py45393%128, 131, 134
   feature_importance_accessor.py240100% 
   metrics_accessor.py182199%244
   report.py135199%487
skore/src/skore/_sklearn/_estimator
   __init__.py90100% 
   data_accessor.py580100% 
   feature_importance_accessor.py144298%223–224
   metrics_accessor.py356897%200, 202, 209, 300, 369, 373, 388, 423
   report.py167298%448–449
skore/src/skore/_sklearn/_plot
   __init__.py30100% 
   base.py70100% 
   style.py290100% 
   utils.py141795%59, 83–85, 89, 344–345
skore/src/skore/_sklearn/_plot/data
   __init__.py20100% 
   table_report.py183199%682
skore/src/skore/_sklearn/_plot/metrics
   __init__.py60100% 
   confusion_matrix.py70494%91, 99, 121, 229
   feature_importance_display.py672168%92, 115–116, 118, 136–140, 142–149, 152–154, 156
   metrics_summary_display.py90100% 
   precision_recall_curve.py278598%459, 559, 563, 623, 743
   prediction_error.py225597%181, 188, 424, 507, 687
   roc_curve.py290897%389, 512, 517, 618, 623, 627, 696, 818
skore/src/skore/_sklearn/train_test_split
   __init__.py00100% 
   train_test_split.py580100% 
skore/src/skore/_sklearn/train_test_split/warning
   __init__.py80100% 
   high_class_imbalance_too_few_examples_warning.py19194%83
   high_class_imbalance_warning.py200100% 
   random_state_unset_warning.py100100% 
   shuffle_true_warning.py90100% 
   stratify_is_set_warning.py100100% 
   time_based_column_warning.py210100% 
   train_test_split_warning.py30100% 
skore/src/skore/_utils
   __init__.py6266%8, 13
   _accessor.py90396%34, 146, 190
   _environment.py270100% 
   _fixes.py80100% 
   _index.py50100% 
   _logger.py22481%15–17, 19
   _measure_time.py100100% 
   _parallel.py38392%23, 33, 124
   _patch.py13561%21, 23–24, 35, 37
   _progress_bar.py460100% 
   _repr_html.py80100% 
   _show_versions.py380100% 
   _testing.py550100% 
skore/src/skore/project
   __init__.py20100% 
   project.py480100% 
   summary.py740100% 
   widget.py165696%436, 439–441, 525–526
TOTAL398611697% 

Tests Skipped Failures Errors Time
1060 5 💤 0 ❌ 0 🔥 4m 24s ⏱️

@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2025

Documentation preview @ 19504ba

@MarieSacksick MarieSacksick changed the base branch from main to fix_clustering August 6, 2025 11:43
@MarieSacksick MarieSacksick marked this pull request as ready for review August 6, 2025 12:00
Base automatically changed from fix_clustering to main August 7, 2025 11:04
@thomass-dev thomass-dev requested a review from glemaitre August 7, 2025 20:07
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks really. We still need the following:

  • Similarly to skore/tests/unit/reports/estimator/data/test_accessor.py, we need to create skore/tests/unit/reports/cross-validation/data/test_accessor.py with the same type of tests.
  • We miss the documentation in the rst file cross_validation_report.rst (we have the same issue for the estimator_report.rst and I'll make a quick PR to solve this problem).
  • In terms of documentation, we should scan the user guide to see where the information make sense to be added and we should use it somewhere in the example. I recall that it was complex when using the EstimatorReport because our example used the CrossValidationReport so it should be quite easy to find the example where the data analysis will fit.
  • As commented I'll make a couple of PR that will isolate some fix or refactoring that are not related to the PR itself to make the history cleaner when it comes to the future us using git blame.

Comment on lines 76 to 79
# pd.Series(np.ones(100)),
# pd.DataFrame(np.ones((100, 1)), columns=["Target"]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that they are legit case and we need to test them as well. If there is a bug (I suspect that there is one because I stumbled in #1990), then we need to make a PR to fix it first. So here, I think that we should pass a Series that does not have a name as it is now, and another where we fix the name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #2013

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it may have been another error. I fixed it, but also did a ticket on skrub side to be sure: skrub-data/skrub#1593.

github-merge-queue bot pushed a commit that referenced this pull request Sep 1, 2025
This PR simplifies #1965 and
allow to get a clean history for the `find_estimators` functionality.

For the sake of completness, we remove this utility after dropping the
feature discussed in #1811
because it was not obvious how to add every components of a pipeline on
the parallel coordinate plot.
@MarieSacksick MarieSacksick marked this pull request as draft September 5, 2025 16:19
@MarieSacksick MarieSacksick marked this pull request as ready for review September 9, 2025 09:48
@github-actions
Copy link
Contributor

github-actions bot commented Sep 9, 2025

Caution

Some commits in the pull request are not signed, or GitHub is not able to verify the signature.
Please sign all your commits; you can find more information here.
Please note that when you activate commit signing, you'll need to retroactively sign your previous commits.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I will check the documentation rendering before to merge but otherwise, it looks good.

@glemaitre glemaitre merged commit 84fc6d3 into main Sep 9, 2025
36 checks passed
@glemaitre glemaitre deleted the data_accessor_cv branch September 9, 2025 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(data): Extend data accessor to the CrossValidationReport

4 participants