Skip to content

Raise exception if dask cluster is incompatible with Dataset.to_ddf return type #339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

oliverholworthy
Copy link
Member

Raise exception if dask cluster is incompatible with Dataset.to_ddf return type.

This was motivated by a test we had in Merlin Systems where we were using the Distributed helper class with argument cluster_type="cpu" and using a Dataset(df).to_ddf() which returned a dask_cudf.DataFrame. This type of DataFrame is incompatible with the cluster type distributed.LocalCluster, since it doesn't handle gpu-backed dataframes. When trying to run a dask computation in this scenario, unexpected errors can happen related to unmanaged memory.

This PR aims to reduce the risk of this scenario (for the case where to_ddf is called in a context where the global dask client is defined). A longer-term fix is to add an error in dask_cudf or distributed to check for this scenario and alert the user.

@oliverholworthy oliverholworthy added the enhancement New feature or request label May 31, 2023
@oliverholworthy oliverholworthy added this to the Merlin 23.06 milestone May 31, 2023
@oliverholworthy oliverholworthy self-assigned this May 31, 2023
@github-actions
Copy link

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-339

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants