Skip to content

Conversation

@weixiuli
Copy link
Contributor

@weixiuli weixiuli commented Dec 12, 2023

What changes were proposed in this pull request?

Currently, the velox BloomFilterAggregate checks the input row and throws an exception if there are some null values in the row. So we need to be consistent with spark's behavior and ignore null values.

The spark BloomFilterAggregate will Ignore null values. https://github.com/apache/spark/blob/6cdca10f148433664b3e2be6f655b0ddba817537/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/BloomFilterAggregate.scala#L180-L188

 override def update(buffer: BloomFilter, inputRow: InternalRow): BloomFilter = {
    val value = child.eval(inputRow)
    // Ignore null values.
    if (value == null) {
      return buffer
    }
    updater.update(buffer, value)
    buffer
  }

Fixes #4021.

How was this patch tested?

@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@weixiuli weixiuli changed the title Support the null values in bloom_filter Spark aggregate [VL]Support the null values in bloom_filter Spark aggregate Dec 12, 2023
@weixiuli weixiuli changed the title [VL]Support the null values in bloom_filter Spark aggregate [VL] Support the null values in bloom_filter Spark aggregate Dec 12, 2023
@PHILO-HE PHILO-HE changed the title [VL] Support the null values in bloom_filter Spark aggregate [GLUTEN-4021][VL] Support the null values in bloom_filter Spark aggregate Dec 14, 2023
@github-actions
Copy link

#4021

@github-actions
Copy link

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale stale label Feb 13, 2024
@github-actions
Copy link

This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks.

@github-actions github-actions bot closed this Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support the null values in bloom_filter Spark aggregate

1 participant