Improve detection of scikit-learn parity regressions #6553

csadorf · 2025-04-18T15:32:42Z

This PR enhances the infrastructure for monitoring scikit-learn parity in cuML's accelerated estimators, enabling better tracking and management of differences between scikit-learn and cuML implementations.

Key Changes

Moved scikit-learn test tooling from ci/ to python/cuml/cuml/accel/tests/scikit-learn/ for better organization
Added xfail-list.yaml to track known differences from scikit-learn
Added tooling to summarize-results script to maintain xfail-list
Added parallel test execution support via pytest-xdist for faster test runs

Infrastructure Capabilities

The new infrastructure enables continuous monitoring of scikit-learn parity by:

Running scikit-learn tests with cuML acceleration
Tracking pass rates and comparing against thresholds
Automatically detecting new failures (regressions)
Managing known differences via xfail lists

Related Changes

Closes Scikit-learn Parity Regression Detection Infrastructure #6570

copy-pr-bot · 2025-04-18T15:32:45Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

csadorf · 2025-04-18T15:34:15Z

/ok to test 5759457

csadorf · 2025-04-18T16:20:22Z

/ok to test 8d56c54

csadorf · 2025-04-18T18:10:30Z

/ok to test 1772262

…sai#6553) This change modifies the KMeans test to set n_init to 2, enhancing the stability of the k-means|| initialization process. Additional context can be found in issue rapidsai#5530.

#6555) This change modifies the KMeans test to set n_init to 2, enhancing the stability of the k-means|| initialization process. Additional context can be found in issue #5530. Closes #5530. Authors: - Simon Adorf (https://github.com/csadorf) Approvers: - Divye Gala (https://github.com/divyegala) - Jim Crist-Harif (https://github.com/jcrist) URL: #6555

csadorf · 2025-04-21T14:11:57Z

/ok to test fd2d0b2

csadorf · 2025-04-22T17:16:43Z

/ok to test a80327d

csadorf · 2025-04-22T17:19:34Z

/ok to test d5ed917

csadorf · 2025-04-22T18:22:58Z

/ok to test e00dc0b

csadorf · 2025-04-22T19:48:38Z

/ok to test 86819cf

- Move most of the related tooling from ci/ to python/cuml/cuml/accel/tests/scikit-learn/ - Add xfail-list.yaml to track known differences from scikit-learn - Add --format=xfail_list to automatically generate xfail lists from failures The infrastructure enables continuous monitoring of scikit-learn parity by: - Running scikit-learn tests with cuML acceleration - Tracking pass rates and comparing against thresholds - Automatically detecting new failures - Managing known differences via xfail lists Import pytest lazily within the pytest_collection_modifyitems function to avoid requiring it for normal cuML usage, ensuring pytest is only needed during test execution.

jcrist

Two tiny nits, but otherwise this looks great! Thanks for working on this.

python/cuml/cuml/accel/pytest_plugin.py

jameslamb

Approving for packaging-codeowners / ci-codeowners.

Full disclosure, I only skimmed summarize-results.py, since it looks pretty complex but also isn't user-facing.

csadorf · 2025-04-25T19:32:49Z

/merge

github-actions bot added Cython / Python Cython or Python issue ci labels Apr 18, 2025

csadorf added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 18, 2025

csadorf self-assigned this Apr 18, 2025

csadorf force-pushed the ci/accel-skl-regression-testing branch from 1772262 to fd2d0b2 Compare April 19, 2025 17:36

csadorf force-pushed the ci/accel-skl-regression-testing branch from fd2d0b2 to a80327d Compare April 22, 2025 17:16

csadorf linked an issue Apr 22, 2025 that may be closed by this pull request

Scikit-learn Parity Regression Detection Infrastructure #6570

Closed

csadorf mentioned this pull request Apr 22, 2025

Support weights="distance" for KNeighbors* in cuml.accel #6554

Merged

csadorf force-pushed the ci/accel-skl-regression-testing branch from e00dc0b to 86819cf Compare April 22, 2025 19:48

csadorf added 3 commits April 22, 2025 15:38

Run scikit-learn test suite in parallel.

81b02c2

Update xfail list.

58f24eb

csadorf force-pushed the ci/accel-skl-regression-testing branch from 86819cf to 58f24eb Compare April 22, 2025 20:38

csadorf changed the title ~~Add scikit-learn parity regression detection infrastructure~~ Improve detection of scikit-learn parity regressions Apr 22, 2025

Remove obsolete scripts.

d014a21

csadorf marked this pull request as ready for review April 22, 2025 20:57

csadorf requested review from a team as code owners April 22, 2025 20:57

csadorf requested review from jameslamb, cjnolet and viclafargue April 22, 2025 20:57

jcrist approved these changes Apr 24, 2025

View reviewed changes

python/cuml/cuml/accel/pytest_plugin.py Show resolved Hide resolved

python/cuml/cuml/accel/pytest_plugin.py Show resolved Hide resolved

csadorf added 2 commits April 24, 2025 16:11

Add pyyaml to test dependencies.

c3a0c07

Use more specific xfail reason.

feae07e

csadorf requested a review from a team as a code owner April 24, 2025 21:13

github-actions bot added the conda conda issue label Apr 24, 2025

csadorf added 2 commits April 25, 2025 09:01

Merge branch 'branch-25.06' into ci/accel-skl-regression-testing

b9bdf32

Update xfail list.

54397a5

jameslamb approved these changes Apr 25, 2025

View reviewed changes

jcrist mentioned this pull request Apr 25, 2025

Add timeout to scikit-learn test suite #6591

Open

rapids-bot bot merged commit f3a16f4 into rapidsai:branch-25.06 Apr 25, 2025
77 of 78 checks passed

csadorf deleted the ci/accel-skl-regression-testing branch April 25, 2025 21:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve detection of scikit-learn parity regressions #6553

Improve detection of scikit-learn parity regressions #6553

csadorf commented Apr 18, 2025 •

edited

Loading

copy-pr-bot bot commented Apr 18, 2025

csadorf commented Apr 18, 2025

csadorf commented Apr 18, 2025

csadorf commented Apr 18, 2025

csadorf commented Apr 21, 2025

csadorf commented Apr 22, 2025

csadorf commented Apr 22, 2025

csadorf commented Apr 22, 2025

csadorf commented Apr 22, 2025

jcrist left a comment

jameslamb left a comment

csadorf commented Apr 25, 2025

Improve detection of scikit-learn parity regressions #6553

Improve detection of scikit-learn parity regressions #6553

Conversation

csadorf commented Apr 18, 2025 • edited Loading

Key Changes

Infrastructure Capabilities

Related Changes

copy-pr-bot bot commented Apr 18, 2025

csadorf commented Apr 18, 2025

csadorf commented Apr 18, 2025

csadorf commented Apr 18, 2025

csadorf commented Apr 21, 2025

csadorf commented Apr 22, 2025

csadorf commented Apr 22, 2025

csadorf commented Apr 22, 2025

csadorf commented Apr 22, 2025

jcrist left a comment

Choose a reason for hiding this comment

jameslamb left a comment

Choose a reason for hiding this comment

csadorf commented Apr 25, 2025

csadorf commented Apr 18, 2025 •

edited

Loading