-
Notifications
You must be signed in to change notification settings - Fork 954
Handle empty aggregations in multi-partition cudf.polars group_by #18277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle empty aggregations in multi-partition cudf.polars group_by #18277
Conversation
I think the better way to handle this is to ensure that the groupby node we make always has aggs. A grouped aggregation with only keys is just a
|
Thanks, that does seem like a nicer spot for it. That turns up two separate issues:
I believe something down in |
It took some wandering, but the This does still leave the
because now the experimental executor is receiving a |
Yeah, I've mostly been stuck trying to expand #17941 to handle For the case of |
Yeah, that's the right thing to do. |
I'm seeing some test failures locally that I'll need to look into.
Most likely related to changing the |
## Description Adds support for `df.unique(...)`. Possibly replaces parts of #18277 ## Checklist - [ ] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). - [ ] New or existing tests cover these changes. - [ ] The documentation is up to date with these changes.
@TomAugspurger @rjzamora what do we want to do with this PR? How much should still exist after #18576? If none, does #18276 still need additional work to be resolved? |
I suspect the changes to |
Lawrence left a TODO about rewrite_groupby using distinct. I'll push that here quick (probably as a force push, sorry. The previous reviews are obsolete now so I think it's OK). |
aef7f3f
to
1d4073c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems correct to me - Thanks!
Thanks Tom. |
/merge |
Description
This fixes a bug where a group_by with no aggregations raised a ValueError.
The fix uses
Distinct
, which is equivalent to a groupby with no aggregations.Distinct
was previously not supported by the multi-partition executor, so that's implemented here as well.Closes #18276
Checklist