Skip to content

from_protobuf kernel (part1): scalar field decoding#4435

Merged
thirtiseven merged 9 commits intoNVIDIA:mainfrom
thirtiseven:part1-scalar-decode
Apr 25, 2026
Merged

from_protobuf kernel (part1): scalar field decoding#4435
thirtiseven merged 9 commits intoNVIDIA:mainfrom
thirtiseven:part1-scalar-decode

Conversation

@thirtiseven
Copy link
Copy Markdown
Collaborator

@thirtiseven thirtiseven commented Apr 7, 2026

from_protobuf kernel (part1): scalar field decoding [2/4]

Part 1 of #4107

Summary

Second PR in the series. Replaces the all-null stub from part0 with actual GPU decoding of flat (non-nested, non-repeated) scalar protobuf fields. After this PR, decode_protobuf_to_struct can decode messages with scalar fields end-to-end; repeated and nested fields still produce null columns.

What is included

Scan kernel (protobuf_kernels.cu):

  • scan_all_fields_kernel: one-thread-per-row parser that scans each protobuf message and records (offset, length) for all top-level fields. Supports "last one wins" semantics, unknown field skipping, and malformed-row detection.
  • Real check_required_fields_kernel replacing the part0 stub.
  • validate_enum_values_kernel with binary-search validation.
  • compute_enum_string_lengths_kernel / copy_enum_string_chars_kernel for enum-as-string decoding.

Decode pipeline (protobuf.cu):

  • Field classification into scalar / repeated / nested categories.
  • Batched scalar extraction in 12 type groups (varint int32/uint32/int64/uint64/bool/zigzag32/zigzag64, fixed32/fixed64, float32/float64, fallback).
  • Per-field extraction for STRING, BYTES (LIST<UINT8>), and enum-as-string fields.
  • Required field validation (proto2 semantics).
  • Input null mask propagation: null input rows produce null output struct rows (combined with PERMISSIVE-mode row invalidation).
  • Repeated / nested fields fall through to make_null_column_with_schema.

Enum builders (protobuf_builders.cu):

  • build_enum_string_column: validates enum values and converts to UTF-8 name strings.

Test coverage (98 tests)

  • Scalar types: varint, zigzag, fixed32/64, float/double special values, bool, string, bytes
  • Default values: int, float, bool, string, mixed present/absent
  • Required fields: present, missing (permissive + failfast), multiple fields, null input rows
  • Enum: as-int, as-string, unknown values, mixed valid/unknown, row nulling in PERMISSIVE mode
  • Error handling: malformed varint, truncated fields, wrong wire type, field number validation
  • Unknown field skipping: varint, fixed64, length-delimited, fixed32
  • Schema projection, last-one-wins, empty message, zero-row, null propagation

Follow-up PRs

  1. Repeated + nested — count/scan/build for repeated fields and recursive nested message decode
  2. Benchmarks + cleanup

thirtiseven and others added 5 commits April 7, 2026 13:40
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
When binary_input has null rows, the output struct must mark those
rows as null. Combine input null mask with PERMISSIVE-mode row
invalidation in the final valid_if predicate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Input null mask is handled by the Scala caller (mergeAndSetValidity),
not the C++ decoder. The PERMISSIVE-mode null mask is the only one
built at this layer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Combine input null mask with PERMISSIVE-mode row invalidation in
the C++ decoder so the API is self-contained. Update test
expectations to match: null input rows now produce null struct rows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 7, 2026

Greptile Summary

This PR replaces the all-null stub from part 0 with a functional two-pass GPU decoder for flat scalar protobuf fields. Pass 1 (scan_all_fields_kernel) records (offset, length) locations for each field per row; Pass 2 dispatches batched or per-field extraction kernels grouped by type. The null-propagation logic, "last-one-wins" semantics, required-field validation, and enum-as-string conversion are all implemented.

The logic is sound and the prior P0/P1 concerns from the part0 review (dead schema upload, inconsistent memory resources) have been resolved. Two small issues remain.

Confidence Score: 5/5

Safe to merge; all findings are P2 style/optimization suggestions with no blocking correctness issues.

Prior P0/P1 concerns from the part0 review are fully resolved. The new scan+extract pipeline is logically consistent — scan/extract pointer arithmetic is correct for sliced list columns, stream ordering of allocations is safe, null propagation to children is correct, and last-one-wins semantics are implemented properly. The three remaining comments are all P2: one dead variable, one unnecessary GPU pass in the no-error PERMISSIVE path, and a question about intended FAIL-FAST semantics for unknown enum values. None block correctness for the tested paths.

protobuf.cu (dead variable + valid_if concern) and protobuf_kernels.cu (FAIL-FAST enum behavior)

Important Files Changed

Filename Overview
src/main/cpp/src/protobuf/protobuf.cu New scan+extract decode pipeline for scalar fields; one dead variable (has_enum_fields) and a benign but wasteful unconditional valid_if in PERMISSIVE mode with no errors.
src/main/cpp/src/protobuf/protobuf_kernels.cu New scan_all_fields_kernel, real check_required_fields_kernel, and enum kernels; unknown enum values do not set the error flag so FAIL-FAST mode does not throw for them.
src/main/cpp/src/protobuf/protobuf_builders.cu New build_enum_string_column builder; enum lookup tables, length-compute, and copy kernels are correct and memory-resource usage is consistent with the established per-repo convention.
src/main/cpp/src/protobuf/protobuf_kernels.cuh Adds host-wrapper declarations for the four new kernel launchers; no issues found.
src/main/cpp/src/protobuf/protobuf_host_helpers.hpp New get_output_type_id<FieldT> template helper and updated make_empty_struct_column_with_schema to use it; straightforward and correct.
src/test/java/com/nvidia/spark/rapids/jni/ProtobufTest.java 98 new tests covering varint/zigzag/fixed scalar types, default values, required fields, enums, error handling, and null propagation; only PERMISSIVE mode is tested for unknown enums — no FAIL-FAST coverage for that path.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[decode_protobuf_to_struct] --> B{num_rows == 0?}
    B -- yes --> C[Return empty struct column]
    B -- no --> D[Classify fields: scalar / repeated / nested]
    D --> E{num_scalar > 0?}
    E -- yes --> F[scan_all_fields_kernel\nrecords offset+length per field per row]
    F --> G[maybe_check_required_fields\ncheck_required_fields_kernel]
    G --> H[Batched extraction\nextract_varint/fixed_batched_kernel\ngroups 0-10 by type]
    H --> I[Per-field fallback\nINT32+enum via extract_typed_column]
    I --> J[Per-field STRING/BYTES\nextract_and_build_string_or_bytes_column]
    J --> K[ENUM_STRING\nextract_varint + build_enum_string_column\nvalidate_enum_values_kernel]
    E -- no --> L
    K --> L[Assemble top-level children]
    L --> M[stream.synchronize, check d_error]
    M --> N{PERMISSIVE mode or input nulls?}
    N -- yes --> O[valid_if: combine input mask + d_row_force_null]
    O --> P[apply_parent_mask_to_row_aligned_column\npropagate_nulls_to_descendants]
    N -- no --> Q[No struct mask]
    P --> R[make_structs_column]
    Q --> R
Loading

Reviews (4): Last reviewed commit: "style" | Re-trigger Greptile

Comment thread src/main/cpp/src/protobuf/protobuf.cu Outdated
Comment thread src/main/cpp/src/protobuf/protobuf_builders.cu
thirtiseven and others added 2 commits April 7, 2026 16:25
d_schema, message_data_size, num_repeated, num_nested are not
referenced in the scalar-only decode path. Defer to part3.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven thirtiseven requested a review from ttnghia April 7, 2026 09:25
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
auto const* offsets_end = list_view.offsets_end();
// LIST children are not row-aligned with their parent. Expand the list-row null mask across
// every covered child element so direct access to the backing child column also observes nulls.
auto [element_mask, element_null_count] = cudf::detail::valid_if(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how long we still can use the cudf::detail:: APIs. @mythrocks?

* row-level invalidity buffer so the full struct row can be nulled to match Spark CPU semantics for
* malformed messages.
*/
CUDF_KERNEL void scan_all_fields_kernel(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side comment: We should add a macro JNI_KERNEL instead.

int* error_flag,
bool* row_has_invalid_data)
{
auto row = static_cast<cudf::size_type>(blockIdx.x * blockDim.x + threadIdx.x);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we are losing the ability to use global_thread_id from cudf::detail::. @mythrocks.

@thirtiseven thirtiseven requested a review from mythrocks April 22, 2026 07:39
@thirtiseven thirtiseven self-assigned this Apr 22, 2026
Copy link
Copy Markdown
Collaborator

@ttnghia ttnghia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that we still can use some cudf::detail:: API until some later time, let's move on with this. We will migrate later if needed.

@thirtiseven
Copy link
Copy Markdown
Collaborator Author

build

@thirtiseven thirtiseven merged commit 279a4b5 into NVIDIA:main Apr 25, 2026
5 checks passed
@thirtiseven thirtiseven deleted the part1-scalar-decode branch April 25, 2026 06:09
mythrocks added a commit to bdice/spark-rapids-jni that referenced this pull request Apr 27, 2026
Signed-off-by: MithunR <mithunr@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants