Skip to content

Commit d14cd0c

Browse files
mattgarameta-codesync[bot]
authored andcommitted
fix(cudf): Enable GPU execution for count(*), count(column), and count(NULL) (facebookincubator#16522)
Summary: Enable the cuDF GPU code-path for all count aggregation variants: count(*), count(column), count(constant), and count(NULL). Previously, global count(*) fell back to CPU because its zero-column intermediate representation loses row counts in cuDF. - Support zero-column global count(*) on GPU by preserving row counts through FilterProject and CudfConversion. - Classify count inputs (column, count-all, null-constant) to handle each variant correctly in both groupby and global reduce paths. - Handle count(NULL) on GPU, returning 0 for all groups/globally. - Non-count constant aggregates (e.g. sum(1)) fall back to CPU by design. Tests: - Re-enable previously disabled countStarGlobal test. - Add parameterized tests for all count variants across single, partial+final, and partial+intermediate+final steps, for both global and group-by, with and without nulls. - Add selection tests verifying GPU/CPU routing for zero-column count(*), count(NULL), and sum(1). Fixes facebookincubator#16492 Pull Request resolved: facebookincubator#16522 Reviewed By: peterenescu Differential Revision: D98979967 Pulled By: bikramSingh91 fbshipit-source-id: 2efa92c86fb30d25fcd785ff05f443d57e9a4c1a
1 parent 084f222 commit d14cd0c

File tree

7 files changed

+562
-88
lines changed

7 files changed

+562
-88
lines changed

velox/experimental/cudf/exec/CudfConversion.cpp

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -248,6 +248,19 @@ RowVectorPtr CudfToVelox::getOutput() {
248248
return nullptr;
249249
}
250250

251+
if (outputType_->size() == 0) {
252+
// cuDF zero-column tables do not have a row count, so we sum the sizes
253+
// of all CudfVectors in the inputs_, to maintain the logical count.
254+
// This is necessary to ensure correct behavior for e.g. `count` operators.
255+
vector_size_t totalSize = 0;
256+
while (!inputs_.empty()) {
257+
totalSize += inputs_.front()->size();
258+
inputs_.pop_front();
259+
}
260+
finished_ = noMoreInput_ && inputs_.empty();
261+
return BaseVector::create<RowVector>(outputType_, totalSize, pool());
262+
}
263+
251264
// Drain veloxBuffer_ (populated on a previous call) before consuming
252265
// more GPU inputs.
253266
if (!veloxBuffer_) {

velox/experimental/cudf/exec/CudfFilterProject.cpp

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -246,15 +246,19 @@ RowVectorPtr CudfFilterProject::getOutput() {
246246
VELOX_CHECK_NOT_NULL(cudfInput);
247247
auto stream = cudfInput->stream();
248248
auto inputTableColumns = cudfInput->release()->release();
249+
auto outputSize = input_->size();
249250

250251
if (hasFilter_) {
251252
filter(inputTableColumns, stream);
252253
}
254+
if (!inputTableColumns.empty()) {
255+
outputSize = inputTableColumns.front()->size();
256+
}
253257
auto outputColumns = project(inputTableColumns, stream);
254258

255259
auto outputTable = std::make_unique<cudf::table>(std::move(outputColumns));
256260
auto const numColumns = outputTable->num_columns();
257-
auto const size = outputTable->num_rows();
261+
auto const size = numColumns > 0 ? outputTable->num_rows() : outputSize;
258262
if (CudfConfig::getInstance().debugEnabled) {
259263
VLOG(1) << "cudfProject Output: " << size << " rows, " << numColumns
260264
<< " columns";
@@ -263,7 +267,7 @@ RowVectorPtr CudfFilterProject::getOutput() {
263267
auto cudfOutput = std::make_shared<CudfVector>(
264268
input_->pool(), outputType_, size, std::move(outputTable), stream);
265269
input_.reset();
266-
if (numColumns == 0 or size == 0) {
270+
if (size == 0) {
267271
return nullptr;
268272
}
269273
return cudfOutput;

0 commit comments

Comments
 (0)