[MapFromEntries] Native JNI implementation with Spark null-struct semantics#4475
[MapFromEntries] Native JNI implementation with Spark null-struct semantics#4475wjxiz1992 wants to merge 13 commits intoNVIDIA:mainfrom
Conversation
GpuMapFromEntries in spark-rapids currently orchestrates ~10 cuDF
column operations from Scala to handle the case where an input array
contains null struct entries (the whole struct is null, not just a null
key inside a valid struct). CPU map_from_entries returns null for the
entire row in that case, but the GPU was throwing "Cannot use null as
map key."
Move that logic into a native JNI function so that:
- The fix lives in C++ with access to full cuDF internals
- Intermediate operations can be expressed in fewer kernel launches
- spark-rapids Scala code reduces to a single JNI call + dedupByPolicy
Implementation (map_utils.cu):
Fast path (no null struct entries anywhere): single global null_count
check — original perf, zero new allocations.
Slow path (>=1 row has a null struct entry):
1. contains_nulls -> per-row has_null_entry (1 kernel)
2. is_null(keys) AND NOT is_null(structs) (2 kernels, fusible)
3. segmented_reduce(max) on flat bool with list offsets -> per-row
row_has_null_key (1 segmented-reduce kernel)
4. reduce(any, !has_null_entry AND row_has_null_key) -> throw check
(1 kernel + 1 GPU-CPU sync -- the only sync in the slow path)
5. copy_if_else(null_scalar, input, has_null_entry) -> masked output
(1 kernel)
vs. the current Scala orchestration: ~10 kernel launches, 2 GPU-CPU
syncs, ~8 intermediate column allocations.
New files:
src/main/cpp/src/map_utils.hpp -- C++ API declaration
src/main/cpp/src/map_utils.cu -- implementation
src/main/cpp/src/MapUtilsJni.cpp -- JNI wrapper
src/main/java/.../jni/MapUtils.java -- Java API
src/test/java/.../jni/MapUtilsTest.java -- 7 unit tests covering:
no-null fast path, null key throws/no-throw, null struct masking,
mixed null-struct + null-key edge case, cross-row isolation,
empty list, outer null row.
Relates to NVIDIA/spark-rapids#14128 / NVIDIA/spark-rapids#14612.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Allen Xu <allxu@nvidia.com>
There was a problem hiding this comment.
Pull request overview
Adds a new native (JNI) implementation of Spark’s map_from_entries null-struct semantics to reduce cuDF ops/kernel launches and simplify the Scala-side implementation in spark-rapids.
Changes:
- Introduces
MapUtils.mapFromEntriesJava API and JNI wrapper, backed by a new C++/CUDA implementation. - Implements Spark-specific behavior: rows with any null struct entry are masked to null; null keys in valid structs optionally throw.
- Adds JUnit coverage for fast/slow paths and edge cases (null struct entries, null keys, empty lists, outer null rows).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/main/cpp/src/map_utils.hpp | Declares the native C++ API for Spark semantics handling. |
| src/main/cpp/src/map_utils.cu | Implements the null-struct masking + null-key validation logic on GPU. |
| src/main/cpp/src/MapUtilsJni.cpp | Exposes the native implementation to Java via JNI. |
| src/main/java/com/nvidia/spark/rapids/jni/MapUtils.java | Adds the public Java entrypoint that returns a new ColumnVector. |
| src/test/java/com/nvidia/spark/rapids/jni/MapUtilsTest.java | Adds unit tests covering semantics and key edge cases. |
| src/main/cpp/CMakeLists.txt | Wires the new JNI/CUDA sources into the native build. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try (ColumnVector input = ColumnVector.fromLists(LIST_TYPE, row0, row1)) { | ||
| assertThrows(Exception.class, () -> MapUtils.mapFromEntries(input, true).close()); | ||
| } |
There was a problem hiding this comment.
These assertions use assertThrows(Exception.class, ...), which will pass for unrelated failures and makes it harder to verify the intended Spark-semantic error. Prefer asserting the specific exception type used elsewhere in this repo for cuDF/JNI errors (e.g., CudfException) and, if stable, also assert the message contains "Cannot use null as map key.".
There was a problem hiding this comment.
Fixed — both assertThrows calls now use CudfException.class (latest commit).
| CUDF_EXPECTS(input.type().id() == cudf::type_id::LIST, | ||
| "map_from_entries: input must be a LIST column"); | ||
|
|
||
| if (input.size() == 0) { return cudf::make_empty_column(input.type()); } | ||
|
|
||
| auto const lists_cv = cudf::lists_column_view(input); | ||
| auto const structs = lists_cv.child(); | ||
| CUDF_EXPECTS(structs.type().id() == cudf::type_id::STRUCT, | ||
| "map_from_entries: list child must be a STRUCT column"); | ||
| CUDF_EXPECTS(structs.num_children() >= 1, | ||
| "map_from_entries: struct must have at least one child column (KEY)"); |
There was a problem hiding this comment.
The early return for input.size() == 0 happens before validating that the list child is a STRUCT with at least one child. This means an empty LIST<non-STRUCT> column would incorrectly be accepted, despite the API contract/documentation saying non-LIST(STRUCT(...)) inputs should throw. Move the lists_column_view/STRUCT checks above the empty-size return (or keep the return but after validation).
There was a problem hiding this comment.
The LIST type check (CUDF_EXPECTS(input.type().id() == cudf::type_id::LIST)) fires before the early return, so a non-LIST input is rejected. The STRUCT/arity checks are intentionally after the early return: a zero-row column has no entries to validate structure for, and cudf::empty_like preserves the full column type faithfully.
| // mask[i] = true → null_scalar (row had a null struct entry → output null row) | ||
| // mask[i] = false → input[i] (row was fine → keep original) | ||
| // mask[i] = null → input[i] (outer null row stays null via input[i]) | ||
| auto null_scalar = cudf::make_default_constructed_scalar(input.type(), stream, mr); |
There was a problem hiding this comment.
make_default_constructed_scalar creates a valid scalar (e.g., for LIST it will be an empty list), so copy_if_else will replace rows with null-struct entries with a valid default value rather than a null outer row. To actually mask rows to null per Spark semantics, construct an invalid scalar (or explicitly mark this scalar invalid) before passing it to copy_if_else.
| auto null_scalar = cudf::make_default_constructed_scalar(input.type(), stream, mr); | |
| auto null_scalar = cudf::make_default_constructed_scalar(input.type(), stream, mr); | |
| null_scalar->set_valid_async(false, stream); |
There was a problem hiding this comment.
This code path no longer exists. The current slow-path implementation does not use make_default_constructed_scalar or copy_if_else at all — it builds the null mask directly via bools_to_mask + bitmask_and + purge_nonempty_nulls.
| // Step 1: Per-row flag — does row i contain any null struct entry? | ||
| // contains_nulls returns BOOL8, size = input.size(). | ||
| // A null outer row itself yields null in has_null_entry; copy_if_else handles that correctly. | ||
| auto has_null_entry = cudf::lists::contains_nulls(lists_cv, stream, mr); | ||
|
|
||
| // Fast path: no null struct entries anywhere — simple global null-key check. | ||
| auto any_null_entry_scalar = cudf::reduce( | ||
| *has_null_entry, | ||
| *cudf::make_any_aggregation<cudf::reduce_aggregation>(), | ||
| cudf::data_type{cudf::type_id::BOOL8}, | ||
| stream, | ||
| mr); | ||
| bool const any_null_entry = | ||
| any_null_entry_scalar->is_valid(stream) && | ||
| static_cast<cudf::numeric_scalar<bool>*>(any_null_entry_scalar.get())->value(stream); | ||
|
|
||
| if (!any_null_entry) { |
There was a problem hiding this comment.
The function always computes lists::contains_nulls and then reduces it to find any_null_entry, which adds allocations/kernels (and a host read) even when there are no null struct entries. Since the fast-path condition is simply “no null struct entries anywhere”, you can check structs.null_count(stream) == 0 first and return immediately, only computing contains_nulls/row masking when structs.null_count > 0. This matches the stated goal of keeping the fast path cheap.
There was a problem hiding this comment.
The structs.null_count() == 0 short-circuit is a valid micro-optimisation. For now the current design (contains_nulls + reduce(any)) is one extra kernel on the fast path and readable. Happy to add the null_count guard if preferred — please let us know.
| try (ColumnVector input = ColumnVector.fromLists(LIST_TYPE, row0)) { | ||
| assertThrows(Exception.class, () -> MapUtils.mapFromEntries(input, true).close()); | ||
| } |
There was a problem hiding this comment.
These assertions use assertThrows(Exception.class, ...), which will pass for unrelated failures and makes it harder to verify the intended Spark-semantic error. Prefer asserting the specific exception type used elsewhere in this repo for cuDF/JNI errors (e.g., CudfException) and, if stable, also assert the message contains "Cannot use null as map key.".
There was a problem hiding this comment.
Fixed — both assertThrows calls now use CudfException.class (latest commit).
Greptile SummaryAdds Confidence Score: 5/5Safe to merge — all remaining findings are P2 style suggestions with no correctness impact. No P0 or P1 issues found. The slice-correctness logic (offsets_begin, bitmask_and with begin_bits, sliced_offsets), null semantics (outer-null vs null-struct-entry vs null-key distinction), and memory ownership are all correct. The single P2 comment is a Javadoc specificity nit. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[map_from_entries\ninput: LIST STRUCT KEY VALUE] --> B{input.size == 0?}
B -- yes --> C[return empty_like input]
B -- no --> D[contains_nulls → has_null_entry\nper-row BOOL8]
D --> E[reduce_any has_null_entry\n→ any_null_entry bool]
E -- false\nFast Path --> F{throw_on_null_key AND\nkeys.nullable AND\nkeys.null_count > 0?}
F -- no --> G[return column copy of input]
F -- yes --> H[is_null keys → key_is_null\nis_valid input → input_is_valid]
H --> I[segmented_reduce MAX\nover offsets_span\n→ row_has_null_key]
I --> J[LOGICAL_AND\nrow_has_null_key x input_is_valid\n→ row_throw]
J --> K[reduce_any row_throw]
K -- true --> L[throw logic_error\nCannot use null as map key]
K -- false --> G
E -- true\nSlow Path --> M[NOT has_null_entry\n→ no_null_entry]
M --> N{throw_on_null_key AND\nkeys have nulls?}
N -- yes --> O[is_null keys → key_is_null\nsegmented_reduce + LOGICAL_AND no_null_entry\n→ reduce_any → throw check]
O -- throws --> L
N -- no / no throw --> P[bools_to_mask no_null_entry\n→ entry_mask, entry_nc]
O -- no throw --> P
P --> Q[bitmask_and\nentry_mask + input null mask\nbegin_bits = input.offset\n→ combined_mask]
Q --> R[Build result_view\noffset=0, sliced_offsets child]
R --> S[purge_nonempty_nulls\n→ final column]
Reviews (11): Last reviewed commit: "test(map_utils): add Java slow-path and ..." | Re-trigger Greptile |
| auto null_key_in_valid = cudf::binary_operation( | ||
| *key_is_null, | ||
| *struct_is_valid, | ||
| cudf::binary_operator::BITWISE_AND, | ||
| cudf::data_type{cudf::type_id::BOOL8}, | ||
| stream, | ||
| mr); |
There was a problem hiding this comment.
BITWISE_AND used for boolean logic
cudf::binary_operator::LOGICAL_AND would express the intent more clearly for boolean columns. While both are equivalent for strictly {0, 1} BOOL8 values produced by is_null(), using the semantic operator avoids any ambiguity about intent and is the conventional choice for bool algebra in cuDF.
| auto null_key_in_valid = cudf::binary_operation( | |
| *key_is_null, | |
| *struct_is_valid, | |
| cudf::binary_operator::BITWISE_AND, | |
| cudf::data_type{cudf::type_id::BOOL8}, | |
| stream, | |
| mr); | |
| auto null_key_in_valid = cudf::binary_operation( | |
| *key_is_null, | |
| *struct_is_valid, | |
| cudf::binary_operator::LOGICAL_AND, | |
| cudf::data_type{cudf::type_id::BOOL8}, | |
| stream, | |
| mr); |
Same applies to the should_throw computation at line 124–130.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed — changed to cudf::binary_operator::LOGICAL_AND throughout.
| if (!any_null_entry) { | ||
| // All struct entries are valid. Any null key in the flat key column is a real null key. | ||
| auto const keys = structs.child(0); | ||
| if (throw_on_null_key && keys.null_count(stream) > 0) { | ||
| throw cudf::logic_error("Cannot use null as map key."); | ||
| } | ||
| return std::make_unique<cudf::column>(input, stream, mr); |
There was a problem hiding this comment.
Fast-path
null_count() spans all child elements including those of outer-null rows
keys.null_count() operates on the raw structs.child(0) — the full, unsliced keys column. For outer-null list rows, cuDF's Java API always creates empty child segments (offsets[i] == offsets[i+1]), so in practice those rows contribute 0 keys. However, if the input were produced from a non-standard path where a null-validity outer row still has non-empty child data, the count would include keys for those logically-null rows and could produce a spurious throw.
The slow path is immune because no_null_entry is NULL for outer-null rows, making should_throw NULL and therefore skipped by reduce(any). Applying the same guard here would make the fast path fully consistent. In the current JNI context the issue does not trigger, but it is worth noting for robustness.
There was a problem hiding this comment.
Addressed — added a comment at map_utils.cu:107-110 explaining that outer-null LIST rows satisfy cudf's invariant of empty child segments (offsets[i] == offsets[i+1]), so they contribute zero keys and cannot inflate null_count().
There was a problem hiding this comment.
The keys.null_count() > 0 check on line 144 is only a cheap early-exit filter — the actual throw decision uses segmented_reduce over offsets_span (built from offsets_begin(), slice-correct). A null key outside the visible slice would pass the null_count() guard but be excluded by the segmented reduce, so reduce_any never fires on it. No spurious throw possible. The comment block above the check documents this explicitly.
| auto const offsets_col = lists_cv.offsets(); | ||
| auto const offsets_span = cudf::device_span<cudf::size_type const>( | ||
| offsets_col.data<cudf::size_type>(), offsets_col.size()); |
There was a problem hiding this comment.
offsets_col.size() assumes non-sliced input
lists_column_view::offsets() returns a column_view adjusted for the list column's offset, so offsets_col.data<cudf::size_type>() points to the first valid offset and offsets_col.size() equals input.size() + 1. This is correct for segmented_reduce for a non-sliced column (the typical JNI case). A brief comment noting this assumption would help future readers; a sliced input would still work correctly since the absolute offsets index into the full null_key_in_valid, but it is an implicit precondition.
There was a problem hiding this comment.
Addressed — the code now uses lists_cv.offsets_begin() (offset-aware) to build offsets_span, covering exactly the visible rows of a sliced input. A comment at map_utils.cu:97-100 explains this. The output-path offsets are also explicitly sliced with raw_offsets.offset() + input.offset().
There was a problem hiding this comment.
Addressed — offsets_span is now built from lists_cv.offsets_begin() with an explicit input.size() + 1 length (map_utils.cu:130-131). offsets_begin() already accounts for input.offset(), so the slice-aware precondition is enforced rather than implicit, and the sliced-input regression tests added in e1edb87 exercise the path.
Replace copy_if_else(null_scalar, ...) with a direct null-mask approach: - cudf::make_default_constructed_scalar does not support LIST type, causing a CUDF failure when trying to create a null LIST scalar. - Build the outer null mask from NOT(has_null_entry) via bools_to_mask, then AND with the input's existing null mask using bitmask_and(table_view). - Call purge_nonempty_nulls to zero out child data for null rows, satisfying cudf's invariant that null rows in nested columns have empty offset spans. Also adds <cudf/null_mask.hpp> and <cudf/table/table_view.hpp>; removes the now-unnecessary <cudf/scalar/scalar_factories.hpp> include. Signed-off-by: allxu <allxu@nvidia.com> Signed-off-by: Allen Xu <allxu@nvidia.com>
- Remove copy_if_else reference in Step 1 comment; describe the actual bools_to_mask null-as-false behavior instead. - Add outerNullRowAndNullStructEntryRowBothNull test that exercises the bitmask_and path when input already has an outer-null row AND a separate row with a null struct entry — both must be null in output. Signed-off-by: allxu <allxu@nvidia.com> Signed-off-by: Allen Xu <allxu@nvidia.com>
|
pre-commit.ci autofix |
- Extract reduce_any() helper; deduplicate two identical reduce+scalar patterns - Add is_valid(input) AND-guard in fast-path so outer-null rows cannot trigger null-key throw via segmented_reduce - Use offset-aware bitmask_and raw overload to correctly handle sliced inputs; result_view uses offset=0 with sliced_offsets child to stay aligned - Add CUDF_EXPECTS invariant asserting entry_nc > 0 on slow path - Add [[nodiscard]] to map_from_entries declaration - Centralize error string as constexpr kNullKeyError - Add tests: allOuterNullRowsRemainNullNoThrowPolicy, singleOuterNullRowRemainNull (is_valid guard on reduce scalar), outerNullRowAndNullStructEntryRowBothNull (bitmask_and with pre-existing outer-null mask) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Allen Xu <allxu@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Allen Xu <allxu@nvidia.com>
… note - Replace BITWISE_AND with LOGICAL_AND for boolean operations (3 sites) - assertThrows: use CudfException.class instead of Exception.class - Add comment explaining why keys.null_count() is safe for outer-null rows Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Allen Xu <allxu@nvidia.com>
|
build |
…early return Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| CUDF_EXPECTS(input.type().id() == cudf::type_id::LIST, | ||
| "map_from_entries: input must be a LIST column"); | ||
|
|
||
| if (input.size() == 0) { return cudf::empty_like(input); } |
There was a problem hiding this comment.
This zero-row early return happens before the nested child validation below. That means an empty LIST<non-STRUCT> or empty LIST<STRUCT<wrong arity>> input would be accepted, even though the header docs say this API throws on non-LIST(STRUCT(KEY,...)) inputs. If that contract matters, I think the STRUCT/arity checks need to move above the early return; otherwise the docs should be loosened to match the implementation.
There was a problem hiding this comment.
Also get: Medium: map_from_entries() still returns success for zero-row inputs before it verifies the child shape. That means an empty LIST or wrong-arity LIST<STRUCT<...>> will be accepted and returned unchanged instead of failing the documented contract. Because MapUtils.mapFromEntries() is a public Java API and forwards any ColumnView without prevalidation, this is user-reachable for empty inputs.
There was a problem hiding this comment.
Fixed in b9eae67 — the structs.type().id() == STRUCT and num_children() == 2 CUDF_EXPECTS are now above the input.size() == 0 early return, so empty LIST<non-STRUCT> and empty LIST<STRUCT<wrong arity>> inputs fail with the documented error rather than passing through.
There was a problem hiding this comment.
Fixed in b9eae67 — the STRUCT/arity CUDF_EXPECTS are now enforced before the zero-row early return, so the public MapUtils.mapFromEntries Java API cannot silently accept an empty LIST<non-STRUCT> or wrong-arity input.
| } | ||
|
|
||
| @Test | ||
| void outerNullRowAndNullStructEntryRowBothNull() { |
There was a problem hiding this comment.
Given that the native implementation now has slice-aware offset/null-mask handling, can we add at least one sliced-input regression here too? The current tests all build whole columns with ColumnVector.fromLists(...), so they do not exercise the new input.offset() / rebuilt-offsets path. A case where a null key or null struct entry exists outside the visible slice would help guard against future regressions in that logic.
There was a problem hiding this comment.
Added in e1edb87 — four sliced-input regression tests (slicedInputSkipsNullKeyOutsideSlice, slicedInputThrowsOnNullKeyInsideSlice, slicedInputSkipsNullStructOutsideSlice, slicedInputWithOuterNullAndNullStructInsideSlice) use subVector(...) so input.offset() != 0. They cover both the fast path (offsets_begin()) and the slow path (bitmask_and with begin_bits = input.offset()), including a null key / null struct entry outside the visible slice to verify the slice is respected. Local run: Tests run: 18, Failures: 0, Errors: 0, Skipped: 0.
|
…hared null-key check Move the STRUCT type / num_children validation above the zero-row early return so empty LIST<non-STRUCT> and LIST<STRUCT<wrong arity>> inputs fail with CUDF_EXPECTS uniformly, matching the header contract. Extract any_null_key_in_guarded_row() to share the segmented_reduce(MAX) + AND(guard) + reduce_any(...) pipeline between the fast path (guard = is_valid(input)) and the slow path (guard = no_null_entry). Also drops the redundant key_is_null AND struct_is_valid pre-mask on the slow path: for rows with null struct entries the no_null_entry guard already short-circuits any contribution, so the extra binary_op kernel was unnecessary. Signed-off-by: Allen Xu <allxu@nvidia.com>
Adds four tests covering input.offset() != 0 cases via subVector: - slicedInputSkipsNullKeyOutsideSlice — fast path, null key outside the visible offsets_begin() range must not throw - slicedInputThrowsOnNullKeyInsideSlice — fast path, null key inside the visible range still throws - slicedInputSkipsNullStructOutsideSlice — slow/fast gating respects contains_nulls on the sliced lists_cv - slicedInputWithOuterNullAndNullStructInsideSlice — slow path bitmask_and with begin_bits = input.offset(), combining an outer-null row and a null-struct-entry row inside the slice Total MapUtilsTest cases: 18 (all passing locally). Signed-off-by: Allen Xu <allxu@nvidia.com>
|
Thanks — addressing both: Refactor — done in b9eae67. Extracted Performance in spark-rapids — will run the benchmark on the spark-rapids side via NVIDIA/spark-rapids#14612 once a spark-rapids-jni snapshot that includes these changes is published, and post the numbers on that PR. Not blocking this merge. |
FYI to test locally with jni and plugin please refer to https://github.com/NVIDIA/spark-rapids-jni/blob/main/CONTRIBUTING.md#local-testing-of-cross-repo-contributions-cudf-spark-rapids-jni-and-spark-rapids In my humble opinion it's a better practice to identify performance issues earlier, so that we won't have to file another follow-up performance improvement PR a couple of days later after we found that in spark-rapids side. |
|
Ran a before/after microbenchmark comparing pre-refactor (at 0db1485) vs post-refactor (at b9eae67). Same machine, same inputs, only
Observations:
Net: the refactor is not perf-neutral — it's a clear win on the slow path while leaving the fast path unchanged. The spark-rapids end-to-end perf test (per earlier comment) will still run separately on NVIDIA/spark-rapids#14612 once this PR merges and a snapshot is published. |
a4e9ab9 to
e1edb87
Compare
|
End-to-end CPU vs GPU benchmark in spark-rapids. Setup:
GPU: RTX PRO 6000 Blackwell, CUDA 13.1. Each scenario: 1 warm-up + 3 measurements, median reported. Timing covers the full
† With 16 entries × 10% null-struct rate, virtually every input row contains ≥1 null struct entry, so the entire map is null for all rows — CPU short-circuits this early while GPU runs the full slow-path pipeline. Not a representative Spark workload. Takeaways:
These numbers validate that the JNI path is not a regression at any scale and is a meaningful win (1.3–1.9×) once the row count grows past ~5M or entries-per-row grows past ~8. |
@res-life @thirtiseven this is the e2e benchmark on spark-rapids, please re-evaluate this JNI implementation. GPU lost in some scenarios. |
Compliance: - Rename src/map_utils.cu → src/map_utils.cpp (file has no device/Thrust code; per convention, .cu is reserved for __device__/CUDF_KERNEL code). - map_utils.cpp: add direct include of <cudf/aggregation.hpp> for make_any_aggregation / make_max_aggregation (previously transitively sourced from <cudf/reduction.hpp>). - map_utils.hpp: add direct include of <memory> for std::unique_ptr. - map_utils.hpp: reorder Doxygen tags to @throws → @param → @return to match the project/cudf convention (matches list_slice.hpp). - Rename file-local constant kNullKeyError → null_key_error to match the project snake_case style (no k-prefix Hungarian notation). Correctness: - Slow-path null-key check is now gated on keys.nullable() && keys.null_count() > 0, mirroring the fast-path guard. Before the fix every slow-path call with throw_on_null_key=true paid for an is_null BOOL8 alloc + segmented_reduce + binary_op + reduce_any even when the key column had no null mask — a common case in Spark map pipelines. - Add SAFETY comment explaining why keys.null_count() is safe to consult on structs.child(0) without guarding against UNKNOWN_NULL_COUNT. Documentation: - map_utils.hpp: add explicit contract line stating sliced inputs are supported (only rows [input.offset(), input.offset()+input.size()) are inspected). The existing sliced-input regression tests already pin this behaviour; the doc line protects the invariant. Signed-off-by: Allen Xu <allxu@nvidia.com>
Adds src/main/cpp/tests/map_utils.cpp with cudf::test::BaseFixture coverage for branches that are awkward to express from Java: - Input-validation contract branches that match the @throws doc on map_utils.hpp: NonListInputThrows — non-LIST input rejected ListOfNonStructThrows — LIST<INT32> child rejected StructWithWrongArityThrows — STRUCT with 1 child rejected EmptyNonListInputStillThrows — zero-row non-LIST still rejected (pins the ordering of validation vs. empty-size early return) - Non-INT32 key types: StringKeyNullThrows — LIST<STRUCT<string,int>> with null key StringKeyNonNullSucceeds — all-valid strings pass through Registered via ConfigureTest(MAP_UTILS map_utils.cpp). Follows the project convention that every feature with a .cu/.hpp/Jni.cpp triad gets a matching cudf::test::BaseFixture-based test (cf. list_slice.cpp, map_zip_with_utils.cpp). Signed-off-by: Allen Xu <allxu@nvidia.com>
- slowPathNullKeyNoThrowWhenPolicyAllows: exercises the false side of the slow-path `if (throw_on_null_key)` branch. Every existing slow-path test passed throwOnNullKey=true; a mutation that inverted the guard would silently pass without this case. - slowPathAcrossMultipleBitmaskWords: 70-row column with null-struct entries at rows 0, 33, 65 — crosses the 32-row warp boundary and the 64-row bitmask-word boundary to guard bit-alignment regressions in bools_to_mask + bitmask_and + purge_nonempty_nulls. - Drop stale `map_utils.cu:44` line-number reference in a comment; the symbol reference alone is enough and survives future refactors. Total MapUtilsTest cases: 20 (all passing locally). Signed-off-by: Allen Xu <allxu@nvidia.com>
|
About the perf of fast path, a Spark MAP<K,V> and cuDF LIST<STRUCT<K,V>> are the same physical layout — identical offsets, child struct, null mask. When every row is valid, the input IS the output. The plugin just needs to: // In GpuMapFromEntries.doColumnar:
val inputBase = input.getBase
val result = MapUtils.mapFromEntries(inputBase, throwOnNullKey)
if (result == null) {
// Fast path — input unchanged, reinterpret LIST<STRUCT> as MAP.
inputBase.incRefCount()
} else {
result
} // use a byte to indicates:
constexpr std::uint8_t STATE_NULL = 0; // output row is null
constexpr std::uint8_t STATE_VALID = 1; // output row is valid
constexpr std::uint8_t STATE_NULL_KEY = 2; // throw-worthy (valid struct + null key + policy on)
// For each row i:
// - Outer LIST row is null → STATE_NULL, size 0
// - Otherwise scan entries in [offsets[i], offsets[i+1]):
// * any null struct entry → STATE_NULL, size 0
// * else null key (throw policy) → STATE_NULL_KEY, size = end-start
// * else → STATE_VALID, size = end-start
// ---------------------------------------------------------------------------
implement kernel to output a byte array and a size array.
if (cub::DeviceReduce::Max) <= 1, return null column, then Spark-Rapids incRefCountFor the slow path, we can file a follow-up issue, or fix in this PR, from AI: |
Summary
Adds
MapUtils.mapFromEntriesas a native JNI function in spark-rapids-jni to handle Spark'smap_from_entriesnull-struct semantics correctly and efficiently.Motivation
The current fix in spark-rapids (NVIDIA/spark-rapids#14612) orchestrates ~10 cuDF column operations from Scala to handle the case where an input array contains null struct entries. Moving this logic into JNI:
dedupByPolicySpark semantics implemented
"Cannot use null as map key."(whenthrowOnNullKey=true).Implementation
Fast path (no null struct entries anywhere): uses
segmented_reduce(max)over per-entrykey_is_nullto check for null keys within the visible slice range; AND withis_valid(input)to guard outer-null rows from triggering a throw.Slow path (≥1 row has a null struct entry):
contains_nulls→ per-rowhas_null_entry(1 kernel)is_null(keys) AND is_valid(structs)→ per-entrynull_key_in_valid(2 kernels)segmented_reduce(max)on flat bool with list offsets → per-rowrow_has_null_key(1 kernel)reduce(any, !has_null_entry AND row_has_null_key)→ throw check (1 kernel + 1 GPU-CPU sync)bools_to_mask(!has_null_entry)→ AND with input's existing null mask using the offset-awarebitmask_and(masks, begin_bits, mask_size)raw overload (correctly handles sliced columns by consuminginput.null_mask()frominput.offset()); result_view usesoffset=0with asliced_offsetschild view →purge_nonempty_nullsto satisfy cudf's invariant that null LIST rows have empty offset spans.New files
src/main/cpp/src/map_utils.hppsrc/main/cpp/src/map_utils.cusrc/main/cpp/src/MapUtilsJni.cppsrc/main/java/.../jni/MapUtils.javasrc/test/java/.../jni/MapUtilsTest.javaLocal Validation
All 13
MapUtilsTestcases pass:[null_struct, {null_key, 20}]→ null (not throw)is_validguard on reduce scalaris_validguardbitmask_andwith pre-existing null maskFollow-up in spark-rapids
Once this PR merges and a new spark-rapids-jni version is released, NVIDIA/spark-rapids#14612 will be updated to replace the Scala slow path with a single call:
Relates to NVIDIA/spark-rapids#14128.
Documentation
Testing
Performance