Skip to content

[FEA] Unify distinct_count column/table APIs. #10183

Open
@bdice

Description

@bdice

Is your feature request related to a problem? Please describe.
While reviewing #10030, I found that the column and table algorithms for distinct_count have completely different flags for null and NaN handling. The column API has null_policy (include/exclude) and nan_policy (NaN is/isn't null), while the table API has null_equality (nulls are equal/unequal).

This also applies to unordered_distinct_count, introduced in #10030.

Describe the solution you'd like
The distinct count APIs for column/table should use the same flags (meaning that all three flags should probably be available to both APIs). This would also allow the column API to be a pass-through implementation of the table API, with a table composed of only that column, rather than having two implementations (table, column).

Metadata

Metadata

Labels

0 - BacklogIn queue waiting for assignmentfeature requestNew feature or requestlibcudfAffects libcudf (C++/CUDA) code.

Type

No type

Projects

Status

Pairing

Relationships

None yet

Development

No branches or pull requests

Issue actions