from_protobuf kernel (part1): scalar field decoding by thirtiseven · Pull Request #4435 · NVIDIA/spark-rapids-jni

thirtiseven · 2026-04-07T07:51:09Z

`from_protobuf` kernel (part1): scalar field decoding [2/4]

Part 1 of #4107

Summary

Second PR in the series. Replaces the all-null stub from part0 with actual GPU decoding of flat (non-nested, non-repeated) scalar protobuf fields. After this PR, decode_protobuf_to_struct can decode messages with scalar fields end-to-end; repeated and nested fields still produce null columns.

What is included

Scan kernel (protobuf_kernels.cu):

scan_all_fields_kernel: one-thread-per-row parser that scans each protobuf message and records (offset, length) for all top-level fields. Supports "last one wins" semantics, unknown field skipping, and malformed-row detection.
Real check_required_fields_kernel replacing the part0 stub.
validate_enum_values_kernel with binary-search validation.
compute_enum_string_lengths_kernel / copy_enum_string_chars_kernel for enum-as-string decoding.

Decode pipeline (protobuf.cu):

Field classification into scalar / repeated / nested categories.
Batched scalar extraction in 12 type groups (varint int32/uint32/int64/uint64/bool/zigzag32/zigzag64, fixed32/fixed64, float32/float64, fallback).
Per-field extraction for STRING, BYTES (LIST<UINT8>), and enum-as-string fields.
Required field validation (proto2 semantics).
Input null mask propagation: null input rows produce null output struct rows (combined with PERMISSIVE-mode row invalidation).
Repeated / nested fields fall through to make_null_column_with_schema.

Enum builders (protobuf_builders.cu):

build_enum_string_column: validates enum values and converts to UTF-8 name strings.

Test coverage (98 tests)

Scalar types: varint, zigzag, fixed32/64, float/double special values, bool, string, bytes
Default values: int, float, bool, string, mixed present/absent
Required fields: present, missing (permissive + failfast), multiple fields, null input rows
Enum: as-int, as-string, unknown values, mixed valid/unknown, row nulling in PERMISSIVE mode
Error handling: malformed varint, truncated fields, wrong wire type, field number validation
Unknown field skipping: varint, fixed64, length-delimited, fixed32
Schema projection, last-one-wins, empty message, zero-row, null propagation

Follow-up PRs

Repeated + nested — count/scan/build for repeated fields and recursive nested message decode
Benchmarks + cleanup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

When binary_input has null rows, the output struct must mark those rows as null. Combine input null mask with PERMISSIVE-mode row invalidation in the final valid_if predicate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Input null mask is handled by the Scala caller (mergeAndSetValidity), not the C++ decoder. The PERMISSIVE-mode null mask is the only one built at this layer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Combine input null mask with PERMISSIVE-mode row invalidation in the C++ decoder so the API is self-contained. Update test expectations to match: null input rows now produce null struct rows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps · 2026-04-07T08:08:36Z

Greptile Summary

This PR replaces the all-null stub from part 0 with a functional two-pass GPU decoder for flat scalar protobuf fields. Pass 1 (scan_all_fields_kernel) records (offset, length) locations for each field per row; Pass 2 dispatches batched or per-field extraction kernels grouped by type. The null-propagation logic, "last-one-wins" semantics, required-field validation, and enum-as-string conversion are all implemented.

The logic is sound and the prior P0/P1 concerns from the part0 review (dead schema upload, inconsistent memory resources) have been resolved. Two small issues remain.

Confidence Score: 5/5

Safe to merge; all findings are P2 style/optimization suggestions with no blocking correctness issues.

Prior P0/P1 concerns from the part0 review are fully resolved. The new scan+extract pipeline is logically consistent — scan/extract pointer arithmetic is correct for sliced list columns, stream ordering of allocations is safe, null propagation to children is correct, and last-one-wins semantics are implemented properly. The three remaining comments are all P2: one dead variable, one unnecessary GPU pass in the no-error PERMISSIVE path, and a question about intended FAIL-FAST semantics for unknown enum values. None block correctness for the tested paths.

protobuf.cu (dead variable + valid_if concern) and protobuf_kernels.cu (FAIL-FAST enum behavior)

Important Files Changed

Filename	Overview
src/main/cpp/src/protobuf/protobuf.cu	New scan+extract decode pipeline for scalar fields; one dead variable (`has_enum_fields`) and a benign but wasteful unconditional `valid_if` in PERMISSIVE mode with no errors.
src/main/cpp/src/protobuf/protobuf_kernels.cu	New `scan_all_fields_kernel`, real `check_required_fields_kernel`, and enum kernels; unknown enum values do not set the error flag so FAIL-FAST mode does not throw for them.
src/main/cpp/src/protobuf/protobuf_builders.cu	New `build_enum_string_column` builder; enum lookup tables, length-compute, and copy kernels are correct and memory-resource usage is consistent with the established per-repo convention.
src/main/cpp/src/protobuf/protobuf_kernels.cuh	Adds host-wrapper declarations for the four new kernel launchers; no issues found.
src/main/cpp/src/protobuf/protobuf_host_helpers.hpp	New `get_output_type_id<FieldT>` template helper and updated `make_empty_struct_column_with_schema` to use it; straightforward and correct.
src/test/java/com/nvidia/spark/rapids/jni/ProtobufTest.java	98 new tests covering varint/zigzag/fixed scalar types, default values, required fields, enums, error handling, and null propagation; only PERMISSIVE mode is tested for unknown enums — no FAIL-FAST coverage for that path.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[decode_protobuf_to_struct] --> B{num_rows == 0?}
    B -- yes --> C[Return empty struct column]
    B -- no --> D[Classify fields: scalar / repeated / nested]
    D --> E{num_scalar > 0?}
    E -- yes --> F[scan_all_fields_kernel\nrecords offset+length per field per row]
    F --> G[maybe_check_required_fields\ncheck_required_fields_kernel]
    G --> H[Batched extraction\nextract_varint/fixed_batched_kernel\ngroups 0-10 by type]
    H --> I[Per-field fallback\nINT32+enum via extract_typed_column]
    I --> J[Per-field STRING/BYTES\nextract_and_build_string_or_bytes_column]
    J --> K[ENUM_STRING\nextract_varint + build_enum_string_column\nvalidate_enum_values_kernel]
    E -- no --> L
    K --> L[Assemble top-level children]
    L --> M[stream.synchronize, check d_error]
    M --> N{PERMISSIVE mode or input nulls?}
    N -- yes --> O[valid_if: combine input mask + d_row_force_null]
    O --> P[apply_parent_mask_to_row_aligned_column\npropagate_nulls_to_descendants]
    N -- no --> Q[No struct mask]
    P --> R[make_structs_column]
    Q --> R

_{Reviews (4): Last reviewed commit: "style" | Re-trigger Greptile}

d_schema, message_data_size, num_repeated, num_nested are not referenced in the scalar-only decode path. Defer to part3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

ttnghia · 2026-04-22T03:54:40Z

+  auto const* offsets_end   = list_view.offsets_end();
+  // LIST children are not row-aligned with their parent. Expand the list-row null mask across
+  // every covered child element so direct access to the backing child column also observes nulls.
+  auto [element_mask, element_null_count] = cudf::detail::valid_if(


I'm not sure how long we still can use the cudf::detail:: APIs. @mythrocks?

ttnghia · 2026-04-22T03:58:21Z

+ * row-level invalidity buffer so the full struct row can be nulled to match Spark CPU semantics for
+ * malformed messages.
+ */
+CUDF_KERNEL void scan_all_fields_kernel(


Side comment: We should add a macro JNI_KERNEL instead.

ttnghia · 2026-04-22T03:59:12Z

+  int* error_flag,
+  bool* row_has_invalid_data)
 {
+  auto row = static_cast<cudf::size_type>(blockIdx.x * blockDim.x + threadIdx.x);


Again, we are losing the ability to use global_thread_id from cudf::detail::. @mythrocks.

ttnghia

Assuming that we still can use some cudf::detail:: API until some later time, let's move on with this. We will migrate later if needed.

thirtiseven · 2026-04-25T03:58:32Z

build

Signed-off-by: MithunR <mithunr@nvidia.com>

thirtiseven and others added 5 commits April 7, 2026 13:40

from_protobuf kernel (part1): scalar field decoding

4ba6b15

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

trigger PR sync

bf598bb

greptile-apps Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread src/main/cpp/src/protobuf/protobuf.cu Outdated

Comment thread src/main/cpp/src/protobuf/protobuf_builders.cu

thirtiseven and others added 2 commits April 7, 2026 16:25

Remove unused device schema copy and variables

876fc89

d_schema, message_data_size, num_repeated, num_nested are not referenced in the scalar-only decode path. Defer to part3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

style

64f3327

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven requested a review from ttnghia April 7, 2026 09:25

thirtiseven added 2 commits April 15, 2026 18:36

nghialize

8ed68c3

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

style

9f51a6b

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

ttnghia reviewed Apr 22, 2026

View reviewed changes

thirtiseven requested a review from mythrocks April 22, 2026 07:39

thirtiseven self-assigned this Apr 22, 2026

ttnghia approved these changes Apr 25, 2026

View reviewed changes

thirtiseven merged commit 279a4b5 into NVIDIA:main Apr 25, 2026
5 checks passed

thirtiseven deleted the part1-scalar-decode branch April 25, 2026 06:09

mythrocks added a commit to bdice/spark-rapids-jni that referenced this pull request Apr 27, 2026

Fixed whitespace error introduced in NVIDIA#4435.

d3c85b6

Signed-off-by: MithunR <mithunr@nvidia.com>

mythrocks mentioned this pull request Apr 27, 2026

Migrate RMM usage to CCCL MR design #4468

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

from_protobuf kernel (part1): scalar field decoding#4435

from_protobuf kernel (part1): scalar field decoding#4435
thirtiseven merged 9 commits intoNVIDIA:mainfrom
thirtiseven:part1-scalar-decode

thirtiseven commented Apr 7, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ttnghia Apr 22, 2026

Uh oh!

ttnghia Apr 22, 2026

Uh oh!

ttnghia Apr 22, 2026

Uh oh!

ttnghia left a comment

Uh oh!

thirtiseven commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

thirtiseven commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

from_protobuf kernel (part1): scalar field decoding [2/4]

Summary

What is included

Test coverage (98 tests)

Follow-up PRs

Uh oh!

greptile-apps Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

ttnghia Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ttnghia Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ttnghia Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ttnghia left a comment

Choose a reason for hiding this comment

Uh oh!

thirtiseven commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thirtiseven commented Apr 7, 2026 •

edited

Loading

`from_protobuf` kernel (part1): scalar field decoding [2/4]

greptile-apps Bot commented Apr 7, 2026 •

edited

Loading