Skip to content

Add scikit-learn as a dependency #6438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 17 commits into
base: branch-25.06
Choose a base branch
from

Conversation

betatim
Copy link
Member

@betatim betatim commented Mar 14, 2025

This PR both adds scikit-learn as a dependency of cuml and sets up the CI so that we test with scikit-learn 1.5.x as the "oldest" dependency.

For now we use scikit-learn>=1.5 as the dependency for cuml itself. We might change the version based on #6437.

Copy link

copy-pr-bot bot commented Mar 14, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@betatim
Copy link
Member Author

betatim commented Mar 14, 2025

/ok to test

@github-actions github-actions bot added conda conda issue Cython / Python Cython or Python issue labels Mar 14, 2025
@betatim
Copy link
Member Author

betatim commented Mar 14, 2025

/ok to test

@github-actions github-actions bot added the ci label Mar 14, 2025
@betatim
Copy link
Member Author

betatim commented Mar 14, 2025

/ok to test

@csadorf csadorf linked an issue Mar 14, 2025 that may be closed by this pull request
@betatim
Copy link
Member Author

betatim commented Mar 17, 2025

@bdice or @jameslamb for the conda based jobs the setup using oldest-deps in dependencies.yaml works. In the oldest deps job 1.5 is installed, in the other jobs 1.6.1 (the current newest version) is installed.

However in the wheels jobs we always get 1.6.1. Even in the oldest-deps job. Is there a way to have the oldest version installed? Currently the wheel based test doesn't install any dependencies directly, only via installing the cuml wheel. Is there a pattern in another repo that I can re-use that uses rapids-dependency-file-generator to install the dependencies in an explicit step (as is done for the conda jobs) and then install the wheel?

@betatim
Copy link
Member Author

betatim commented Mar 17, 2025

Needs #6444 to be resolved

@csadorf csadorf added the DO NOT MERGE Hold off on merging; see PR for details label Mar 17, 2025
@csadorf
Copy link
Contributor

csadorf commented Mar 17, 2025

In addition to keeping this PR in draft status, I've applied the "DO NOT MERGE" label since this PR should not be merged into the current default branch (25.04).

@jameslamb
Copy link
Member

Thanks for @. I personally think it'd be worth removing the try-catching of scikit-learn imports within this PR:

def has_sklearn():
try:
import sklearn # NOQA
return True
except ImportError:
return False

To be more confident that this change and the proposed floor is working the way you expect it to. But up to you all if you think that makes the PR too large.

This PR both adds scikit-learn as a dependency of cuml

The conda recipes need to be updated as well.

For cuml, scikit-learn needs to be added to the run: dependencies here:

If @jcrist 's PR (#6431) is merged before this one, then you should also remove scikit-learn from run_constrained: in the cuml recipe on this branch.

For cuml-cpu, the pin needs to be updated from an == to a >=, as you've done elsewhere in this PR.

- scikit-learn==1.5.*

However in the wheels jobs we always get 1.6.1. Even in the oldest-deps job. Is there a way to have the oldest version installed?

In ci/test_wheel.sh, you should generate a constraints.txt file and include it in this pip install call:

cuml/ci/test_wheel.sh

Lines 12 to 14 in 9c17b09

rapids-pip-retry install \
./dist/libcuml*.whl \
"$(echo ./dist/cuml*.whl)[test]"

Here's an example from cuDF: https://github.com/rapidsai/cudf/blob/5638caacc01d8bc00803e3309310482ff58f73c8/ci/test_wheel_cudf.sh#L15-L24

@betatim
Copy link
Member Author

betatim commented Mar 18, 2025

/ok to test

@betatim
Copy link
Member Author

betatim commented Mar 19, 2025

Based on #6444 (comment) we now instantiate the classes and pass them to the constructor in SVC. This fixes a bug where we passed the wrong type to the constructor which was discovered by using scikit-learn 1.6. We don't really care about how exactly the class is configured, so we can pass more or less any constructor arguments (as far as I understand from Victor).

@betatim
Copy link
Member Author

betatim commented Mar 20, 2025

/ok to test

rapids-bot bot pushed a commit that referenced this pull request Mar 20, 2025
…6471)

The `estimator` attribute is used in the scikit-learn tags machinery to figure out what tags the meta estimator has. We pass a default constructed instance as it isn't actually used.

This was found as part of #6438 but can be fixed standalone. The problem isn't with the new scikit-learn version, but we discovered it with it.

cc @viclafargue

Authors:
  - Tim Head (https://github.com/betatim)

Approvers:
  - Jim Crist-Harif (https://github.com/jcrist)
  - Victor Lafargue (https://github.com/viclafargue)

URL: #6471
@betatim betatim changed the base branch from branch-25.04 to branch-25.06 April 7, 2025 09:08
@betatim betatim changed the base branch from branch-25.06 to branch-25.04 April 7, 2025 09:12
@betatim betatim changed the base branch from branch-25.04 to branch-25.06 April 7, 2025 09:13
betatim added 7 commits April 7, 2025 11:17
This also makes it so that we test scikit-learn 1.5.x as "oldest"
dependency.
This means that the version of scikit-learn that is installed via the
wheel install is constrained to the oldest version cuml supports.
We can't pass a class, we need to pass an instance so that the tags
infrastructure works.
@betatim betatim force-pushed the add-sklearn-as-dependency branch from 6635720 to cd22e27 Compare April 7, 2025 11:15
Copy link

copy-pr-bot bot commented Apr 7, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@betatim
Copy link
Member Author

betatim commented Apr 7, 2025

/ok to test

@betatim
Copy link
Member Author

betatim commented Apr 7, 2025

/ok to test

@betatim
Copy link
Member Author

betatim commented Apr 7, 2025

/ok to test

1 similar comment
@betatim
Copy link
Member Author

betatim commented Apr 8, 2025

/ok to test

With the minimum version of dask that we require numpy 1.23 can't be
installed because it is too old for dask
@betatim
Copy link
Member Author

betatim commented Apr 8, 2025

/ok to test

betatim added 3 commits April 8, 2025 16:25
Ridge.solver_ was only added in scikit-learn v1.5, so we need to
conditionally set it in the accelerator.
@betatim
Copy link
Member Author

betatim commented Apr 9, 2025

/ok to test

Different versions of scikit-learn have different default values for the
constructor. Try and handle those cases gracefully
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci conda conda issue Cython / Python Cython or Python issue DO NOT MERGE Hold off on merging; see PR for details
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support scikit-learn version 1.6.* Make sklearn a required dependency
3 participants