-
Notifications
You must be signed in to change notification settings - Fork 569
Dask 2025.4.0 compatibility #6576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.06
Are you sure you want to change the base?
Conversation
model_delayed = self._get_internal_model() | ||
if not isinstance(model_delayed, (Future, Delayed)): | ||
model_delayed = dask.delayed( | ||
model_delayed, pure=True, traverse=False | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a somewhat inefficient implementation. Can we instead modify _get_internal_model()
to always return a Future or Delayed?
sizes = [ | ||
(worker_address, result) | ||
for worker_address, result in zip( | ||
worker_addresses, results, strict=True | ||
) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sizes = [ | |
(worker_address, result) | |
for worker_address, result in zip( | |
worker_addresses, results, strict=True | |
) | |
] | |
sizes = list(zip( | |
worker_addresses, results, strict=True | |
)) |
if not isinstance(part, (Delayed, Future)): | ||
part = dask.delayed(part) | ||
return da.from_delayed( | ||
dask.delayed(part), shape=shape, meta=cp.zeros((1)), dtype=dtype | ||
part, shape=shape, meta=cp.zeros((0,) * len(shape), dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is called once within the entire code base. Maybe we can ensure that it returns a Delayed or Future?
Several failures are from functions like cuml/python/cuml/cuml/dask/common/func.py Line 82 in e02797c
cuml/python/cuml/cuml/tests/dask/test_dask_tfidf.py::test_tfidf_transformer[False-True-True-l1-base_case] )
|
The initial dask implementation is relatively old so it's possible that some if its implementation paradigms are outdated. Maybe @cjnolet would be able to comment on this? (it kind of seems like cuml assumes distributed is available, at least in the tests I'm looking at currently,
Also, to be clear: we should implement any non-trivial improvements in a follow-up since this PR is needed to resolve immediate incompatibilites! |
FYI, I don't expect this PR to have any effect on CI one way or the other. My hope is that the changes here are compatible both the current version of dask used in CI (2025.2.0) and the one we're targeting next (2025.4.0). If there are dask issues causing CI failures currently I'd be happy to look. But glancing through the output at https://github.com/rapidsai/cuml/actions/runs/14624947486/job/41035364221?pr=6576, I don't see the actual test failure. |
This fixes several failures with newer versions in Dask observed at rapidsai/dask-upstream-testing#44. Most came down to unnecessarily wrapping
Future
objects inDelayed
.