Skip to content

[FEA]A new kernel to support the nulls in the "searchSpace" argument for ColumnView.contains method. #3934

@firestarman

Description

@firestarman

This is related to NVIDIA/spark-rapids#13708. And the root cause is our GpuInSet does not support nulls in the "list" argument. A short solution in Plugin is at NVIDIA/spark-rapids#13735.

Inside the GpuInSet, ColumnView.contains is called to perform the actual existence check, but this method may behave differently from the CPU one when there are nulls in the list.

For example, given a value 1 and a list [2, 3, null], CPU will output a null for this value 1, while GPU will produce a false. This is Spark-specific, so we probably need a new kernel in JNI for ColumnView.contains, instead of asking for some change in the cuDF kernel.

Here is the CPU implementation at InSet.eval.

      val value = child.eval(input)
      if (value == null) {
        null
      } else if (set.contains(value)) {
        true
      } else if (isNaN(value)) {
        hasNaN
      } else if (hasNull) {
        null
      } else {
        false
      }

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions