from_protobuf kernel (part1): scalar field decoding#4435
from_protobuf kernel (part1): scalar field decoding#4435thirtiseven merged 9 commits intoNVIDIA:mainfrom
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
When binary_input has null rows, the output struct must mark those rows as null. Combine input null mask with PERMISSIVE-mode row invalidation in the final valid_if predicate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Input null mask is handled by the Scala caller (mergeAndSetValidity), not the C++ decoder. The PERMISSIVE-mode null mask is the only one built at this layer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Combine input null mask with PERMISSIVE-mode row invalidation in the C++ decoder so the API is self-contained. Update test expectations to match: null input rows now produce null struct rows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Greptile SummaryThis PR replaces the all-null stub from part 0 with a functional two-pass GPU decoder for flat scalar protobuf fields. Pass 1 ( The logic is sound and the prior P0/P1 concerns from the part0 review (dead schema upload, inconsistent memory resources) have been resolved. Two small issues remain. Confidence Score: 5/5Safe to merge; all findings are P2 style/optimization suggestions with no blocking correctness issues. Prior P0/P1 concerns from the part0 review are fully resolved. The new scan+extract pipeline is logically consistent — scan/extract pointer arithmetic is correct for sliced list columns, stream ordering of allocations is safe, null propagation to children is correct, and last-one-wins semantics are implemented properly. The three remaining comments are all P2: one dead variable, one unnecessary GPU pass in the no-error PERMISSIVE path, and a question about intended FAIL-FAST semantics for unknown enum values. None block correctness for the tested paths. protobuf.cu (dead variable + valid_if concern) and protobuf_kernels.cu (FAIL-FAST enum behavior) Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[decode_protobuf_to_struct] --> B{num_rows == 0?}
B -- yes --> C[Return empty struct column]
B -- no --> D[Classify fields: scalar / repeated / nested]
D --> E{num_scalar > 0?}
E -- yes --> F[scan_all_fields_kernel\nrecords offset+length per field per row]
F --> G[maybe_check_required_fields\ncheck_required_fields_kernel]
G --> H[Batched extraction\nextract_varint/fixed_batched_kernel\ngroups 0-10 by type]
H --> I[Per-field fallback\nINT32+enum via extract_typed_column]
I --> J[Per-field STRING/BYTES\nextract_and_build_string_or_bytes_column]
J --> K[ENUM_STRING\nextract_varint + build_enum_string_column\nvalidate_enum_values_kernel]
E -- no --> L
K --> L[Assemble top-level children]
L --> M[stream.synchronize, check d_error]
M --> N{PERMISSIVE mode or input nulls?}
N -- yes --> O[valid_if: combine input mask + d_row_force_null]
O --> P[apply_parent_mask_to_row_aligned_column\npropagate_nulls_to_descendants]
N -- no --> Q[No struct mask]
P --> R[make_structs_column]
Q --> R
Reviews (4): Last reviewed commit: "style" | Re-trigger Greptile |
d_schema, message_data_size, num_repeated, num_nested are not referenced in the scalar-only decode path. Defer to part3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
| auto const* offsets_end = list_view.offsets_end(); | ||
| // LIST children are not row-aligned with their parent. Expand the list-row null mask across | ||
| // every covered child element so direct access to the backing child column also observes nulls. | ||
| auto [element_mask, element_null_count] = cudf::detail::valid_if( |
There was a problem hiding this comment.
I'm not sure how long we still can use the cudf::detail:: APIs. @mythrocks?
| * row-level invalidity buffer so the full struct row can be nulled to match Spark CPU semantics for | ||
| * malformed messages. | ||
| */ | ||
| CUDF_KERNEL void scan_all_fields_kernel( |
There was a problem hiding this comment.
Side comment: We should add a macro JNI_KERNEL instead.
| int* error_flag, | ||
| bool* row_has_invalid_data) | ||
| { | ||
| auto row = static_cast<cudf::size_type>(blockIdx.x * blockDim.x + threadIdx.x); |
There was a problem hiding this comment.
Again, we are losing the ability to use global_thread_id from cudf::detail::. @mythrocks.
ttnghia
left a comment
There was a problem hiding this comment.
Assuming that we still can use some cudf::detail:: API until some later time, let's move on with this. We will migrate later if needed.
|
build |
Signed-off-by: MithunR <mithunr@nvidia.com>
from_protobufkernel (part1): scalar field decoding [2/4]Part 1 of #4107
Summary
Second PR in the series. Replaces the all-null stub from part0 with actual GPU decoding of flat (non-nested, non-repeated) scalar protobuf fields. After this PR,
decode_protobuf_to_structcan decode messages with scalar fields end-to-end; repeated and nested fields still produce null columns.What is included
Scan kernel (
protobuf_kernels.cu):scan_all_fields_kernel: one-thread-per-row parser that scans each protobuf message and records(offset, length)for all top-level fields. Supports "last one wins" semantics, unknown field skipping, and malformed-row detection.check_required_fields_kernelreplacing the part0 stub.validate_enum_values_kernelwith binary-search validation.compute_enum_string_lengths_kernel/copy_enum_string_chars_kernelfor enum-as-string decoding.Decode pipeline (
protobuf.cu):make_null_column_with_schema.Enum builders (
protobuf_builders.cu):build_enum_string_column: validates enum values and converts to UTF-8 name strings.Test coverage (98 tests)
Follow-up PRs