Prepare for n_init=auto in KMeans #6142

betatim · 2024-11-22T14:22:53Z

This adds a warning for the upcoming switch from n_init=1 to 'auto'. This adds the possibility to use 'auto', which helps being compatible with sickit-learn.

In which version should we switch to the new default?

wphicks · 2024-11-22T15:41:33Z

I'd recommend we switch in 25.04 to get compatibility with the scikit-learn change as soon as possible.

betatim · 2024-11-22T16:44:21Z

Works for me. There is a PR for a forward merge to 25.02, which makes me think that there won't be a 25.04? Is that right? 25.05 in that case?

dantegd

Just one comment about versions, I think being quick with the change is a good idea as @wphicks suggests.

python/cuml/cuml/cluster/kmeans.pyx

betatim · 2024-12-06T14:12:56Z

python/cuml/cuml/experimental/accel/estimator_proxy.py

+            sklearn_args = inspect.signature(self._cpu_model_class)
+            sklearn_args = sklearn_args.bind(*args, **kwargs)
+            sklearn_args.apply_defaults()


This means we use the constructor arguments and their default values of the scikit-learn class, combine them with what the user passed in and then feed it to the hyperparameter translator. This makes a difference for cases where the default values in scikit-learn and cuml are different and the user does not explicitly pass that argument.

# The scikit-learn class class SkTimMeans: def __init__(self, foo='bar'): ... # The cuml class class CuTimMeans: def __init__(self, foo='baz'): ... # User code est = SkTimMeans()

foo should be set to 'bar' in est because that is the default value of the scikit-learn class.

It also fixes the many deprecation warnings in the accelerator tests we were getting for KMeans due to the main change of this PR.

This change is breaking quite a few tests and is orthogonal to the change in n_init, it'd be better to revert it and work on it on a follow up PR, since having KMeans init auto is the objective on this PR

It wasn't totally unrelated. It solves the problem that without this the user would get a warning about the deprecation, even though for a scikit-learn user the default is "auto" already (based on the scikit-learn docs).

Switched to using the cuml class'es defaults to see if this creates less breakage.

If we don't use the signature of the constructor then we need to build a way for the hyper-parameter translator to translate arguments that the user didn't pass (I think this is the fundamental issue)

With this approach we get failures like this:

sklearn.utils._param_validation.InvalidParameterError: The 'init' parameter of KMeans must be a str among {'random', 'k-means++'}, a callable or an array-like. Got 'scalable-k-means++' instead.

because the cuml default values are different from the scikit-learn ones :-/

dantegd

lgtm, just requested small changes about the version

python/cuml/cuml/cluster/kmeans.pyx

dantegd · 2025-02-09T20:01:28Z

python/cuml/cuml/experimental/accel/estimator_proxy.py

+            sklearn_args = inspect.signature(self._cpu_model_class)
+            sklearn_args = sklearn_args.bind(*args, **kwargs)
+            sklearn_args.apply_defaults()


This change is breaking quite a few tests and is orthogonal to the change in n_init, it'd be better to revert it and work on it on a follow up PR, since having KMeans init auto is the objective on this PR

This adds a warning for the upcoming switch from n_init=1 to 'auto'. This adds the possibility to use 'auto', which helps being compatible with sickit-learn.

This way the hyper-parameter translator gets to see all arguments and we avoid deprecation warnings due to mismatches between the sklearn and cuml defaults

Co-authored-by: Dante Gama Dessavre <[email protected]>

betatim · 2025-02-19T07:44:19Z

Fixed up the last failing test, hopefully.

csadorf · 2025-02-19T15:49:34Z

The tests are failing with a FutureWarning:

FutureWarning: The default value of `n_init` will change from 1 to 'auto' in 25.04. Set the value of `n_init` explicitly to suppress this warning.

betatim · 2025-02-20T13:45:32Z

There are two failures here:

FAILED test_public_methods_attributes.py::test_UniversalBase_estimators[KMeans] - FutureWarning: The default value of `n_init` will change from 1 to 'auto' in 25.04. Set the value of `n_init` explicitly to suppress this warning.
FAILED test_sklearn_import_export.py::test_kmeans - FutureWarning: The default value of `n_init` will change from 1 to 'auto' in 25.04. Set the value of `n_init` explicitly to suppress this warning.

The one in test_public_methods_attributes.py should be gone now with ignoring the deprecation warning. The second one didn't relaly make sense, after a bit of investigating I found #6342.

We can silence the warning with a decorator on test_kmeans but I think all that does is hide a bug :-/

I think the review comments are outdated now

… tests

csadorf · 2025-02-24T21:27:27Z

/merge

PRs being backported: - [x] #6234 - [x] #6306 - [x] #6320 - [x] #6319 - [x] #6327 - [x] #6333 - [x] #6142 - [x] #6223 - [x] #6235 - [x] #6317 - [x] #6331 - [x] #6326 - [x] #6332 - [x] #6347 - [x] #6348 - [x] #6337 - [x] #6355 - [x] #6354 - [x] #6322 - [x] #6353 - [x] #6359 - [x] #6364 - [x] #6363 - [x] [FIL BATCH_TREE_REORG fix for SM90, 100 and 120](a3e419a) --------- Co-authored-by: William Hicks <[email protected]>

Relevant PRs: [cuml 6235](rapidsai/cuml#6235), [cuml 6142](rapidsai/cuml#6142), for sklearn compatibility. - n_init defaults to "warn" as a future warning for the user (still ends up defaulting to 1), since the default will change to "auto" in 25.04 to match sklearn. - LR output shape changed from (n_classes, n_rows) to (n_rows, n_classes) to match sklearn. Resolves #860 --------- Signed-off-by: Rishi Chandra <[email protected]>

csadorf · 2025-03-13T20:46:31Z

@betatim Can you create a follow-up to remove this now obsolete code segment relating to this PR?

cuml/python/cuml/cuml/tests/test_sklearn_import_export.py

Lines 106 to 110 in 3a8ea8c

    
           # This failure will be fixed by 
        
           # https://github.com/rapidsai/cuml/pull/6142 
        
           # otherwise the predict with default n_init like this 
        
           # roundtrip will fail later. 
        
           pytest.xfail(reason="auto is not supported by cuML n_init yet")

Follow up to #6142 (comment) This cleans up some leftovers from the deprecation cycle for `KMeans`'s `init="auto"` Authors: - Tim Head (https://github.com/betatim) - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Jim Crist-Harif (https://github.com/jcrist) URL: #6445

betatim requested a review from a team as a code owner November 22, 2024 14:22

betatim requested review from cjnolet and wphicks November 22, 2024 14:22

github-actions bot added the Cython / Python Cython or Python issue label Nov 22, 2024

dantegd reviewed Nov 22, 2024

View reviewed changes

python/cuml/cuml/cluster/kmeans.pyx Outdated Show resolved Hide resolved

dantegd assigned betatim Nov 22, 2024

betatim commented Dec 6, 2024

View reviewed changes

betatim changed the base branch from branch-24.12 to branch-25.02 December 19, 2024 08:31

dantegd requested changes Jan 28, 2025

View reviewed changes

python/cuml/cuml/cluster/kmeans.pyx Outdated Show resolved Hide resolved

python/cuml/cuml/cluster/kmeans.pyx Outdated Show resolved Hide resolved

python/cuml/cuml/cluster/kmeans.pyx Outdated Show resolved Hide resolved

dantegd previously requested changes Feb 9, 2025

View reviewed changes

betatim mentioned this pull request Feb 11, 2025

Use the default values of the scikit-learn constructor arguments #6309

Closed

betatim changed the base branch from branch-25.02 to branch-25.04 February 13, 2025 12:28

betatim and others added 15 commits February 13, 2025 04:28

Prepare for n_init=auto in KMeans

0c7e37d

This adds a warning for the upcoming switch from n_init=1 to 'auto'. This adds the possibility to use 'auto', which helps being compatible with sickit-learn.

Fix KMeans tests for upcoming deprecation

baef681

Set 25.04 as change over version

afe373c

Use 25.05

db3ab71

Make the switch in 25.02

77973ba

Update dask tests

8e5081e

Fix up tests that raised warnings

640f9e5

Ping

d6f77fa

Deal with deprecations

ace14f5

Use sklearn default values for constructor arguments

bc7e7b1

This way the hyper-parameter translator gets to see all arguments and we avoid deprecation warnings due to mismatches between the sklearn and cuml defaults

Ping

4e5a02f

Move default args change to a new PR

55b7996

Style fixes

7edb32e

Update deprecation version

2f2142d

Co-authored-by: Dante Gama Dessavre <[email protected]>

Use CUML default arguments

7eaf3a1

betatim force-pushed the kmeans-auto-2 branch from 6866248 to 7eaf3a1 Compare February 13, 2025 12:30

betatim added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 19, 2025

betatim added 2 commits February 20, 2025 04:30

Ignore deprecation warning

b625c73

Merge branch 'branch-25.04' into kmeans-auto-2

fe57a73

betatim mentioned this pull request Feb 20, 2025

Improve model round-tripping #6342

Closed

dantegd mentioned this pull request Feb 20, 2025

Backport release 25.04 PRs for patch release version 25.02.01 #6329

Merged

24 tasks

betatim added 2 commits February 24, 2025 08:29

Merge branch 'branch-25.04' into kmeans-auto-2

8877167

Merge branch 'branch-25.04' into kmeans-auto-2

a5d866c

csadorf assigned csadorf and unassigned betatim Feb 24, 2025

csadorf added 2 commits February 24, 2025 13:22

Add warnings filters for TSNE and KMeans default parameter changes in…

b36c3fd

… tests

Fixup KMeans doc-string.

fecaa1a

csadorf approved these changes Feb 24, 2025

View reviewed changes

dantegd approved these changes Feb 25, 2025

View reviewed changes

rapids-bot bot merged commit a49b24a into rapidsai:branch-25.04 Feb 25, 2025
64 of 65 checks passed

dantegd added a commit to dantegd/cuml that referenced this pull request Feb 25, 2025

Backport PR rapidsai#6142

5fb577a

betatim deleted the kmeans-auto-2 branch February 25, 2025 12:42

dantegd added a commit to dantegd/cuml that referenced this pull request Feb 26, 2025

Backport PR rapidsai#6142

2e85c91

rishic3 mentioned this pull request Mar 6, 2025

Compatibility fixes for cuml 25.02.1 patch NVIDIA/spark-rapids-ml#863

Merged

jcrist mentioned this pull request Mar 12, 2025

Finish porting KMeans to default to n_init="auto" #6428

Closed

betatim mentioned this pull request Mar 17, 2025

Clean up after removing KMeans deprecation warning #6445

Merged

Prepare for n_init=auto in KMeans #6142

Prepare for n_init=auto in KMeans #6142

Uh oh!

Conversation

betatim commented Nov 22, 2024

Uh oh!

wphicks commented Nov 22, 2024

Uh oh!

betatim commented Nov 22, 2024

Uh oh!

dantegd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

betatim Dec 6, 2024

Choose a reason for hiding this comment

Uh oh!

dantegd Feb 9, 2025

Choose a reason for hiding this comment

Uh oh!

betatim Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

betatim Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

dantegd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dantegd Feb 9, 2025

Choose a reason for hiding this comment

Uh oh!

betatim commented Feb 19, 2025

Uh oh!

csadorf commented Feb 19, 2025

Uh oh!

betatim commented Feb 20, 2025

Uh oh!

csadorf commented Feb 24, 2025

Uh oh!

Uh oh!

csadorf commented Mar 13, 2025

Uh oh!

Uh oh!