Skip to content

Decouple from cudf::detail::make_counting_transform_iterator#4306

Merged
mythrocks merged 8 commits intoNVIDIA:release/26.04from
mythrocks:counting-transform-iterator
Mar 31, 2026
Merged

Decouple from cudf::detail::make_counting_transform_iterator#4306
mythrocks merged 8 commits intoNVIDIA:release/26.04from
mythrocks:counting-transform-iterator

Conversation

@mythrocks
Copy link
Copy Markdown
Collaborator

@mythrocks mythrocks commented Feb 24, 2026

This commit introduces utility iterators to be used in place cudf::detail iterators. This is to further reduce dependencies on cudf::detail APIs that are now deemed private to the CUDF project.

make_counting_transform_iterator

This change introduces a version of make_counting_transform_iterator that is specific to Spark RAPIDS JNI.

The previous version of this function is from cudf::detail, which is now deemed private to cuDF. This commit should allow Spark RAPIDS JNI to be insulated from changes to interfaces in cudf::detail.

Note that this version does not use thrust::transform_iterator. It banks instead on cuda::make_transform_iterator instead.

This change introduces a version of `make_counting_transform_iterator`
that is specific to Spark RAPIDS JNI.

The previous version of this function is from `cudf::detail`, which is
now deemed private to cuDF. This commit should allow Spark RAPIDS JNI
to be insulated from changes to interfaces in `cudf::detail`.

Note that this version does not use `thrust::transform_iterator`.
It banks instead on `cuda::make_transform_iterator` instead.

Signed-off-by: MithunR <mithunr@nvidia.com>
@mythrocks mythrocks self-assigned this Feb 24, 2026
@mythrocks mythrocks changed the title Decouple from cudf::detail::make_counting_transform_iterator [WIP] Decouple from cudf::detail::make_counting_transform_iterator Feb 24, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 24, 2026

Greptile Summary

This PR achieves two goals: it decouples Spark RAPIDS JNI from cudf::detail::make_counting_transform_iterator by introducing a local replacement (spark_rapids_jni::util::make_counting_transform_iterator in utilities/iterator.cuh), and simultaneously extends the Bloom filter implementation to support both Spark's V1 and V2 formats, including a configurable seed for V2.

Key changes:

  • utilities/iterator.cuh — New thin wrapper that uses cuda::counting_iterator + thrust::make_transform_iterator instead of thrust::counting_iterator. Replaces cudf::detail::make_counting_transform_iterator in all call sites across row_conversion.cu, shuffle_split.cu, shuffle_assemble.cu, hyper_log_log_plus_plus*.cu, from_json_to_structs.cu, get_json_object.cu, and the corresponding tests.
  • Bloom filter V2 support — Adds bloom_filter_header_v1 (12 bytes) and bloom_filter_header_v2 (16 bytes) structs with static_assert size checks. V2 uses a 64-bit combined hash starting value (h1 * INT32_MAX) and supports a hash seed, matching BloomFilterImplV2.java. Both versions share a common gpu_bit_to_word_mask helper that preserves the big-endian byte-swizzle for Spark compatibility.
  • Correctness improvementsunpack_bloom_filter now takes cudf::device_span<uint8_t const> (removing a previous const_cast), cuda::atomic_ref::fetch_or replaces raw atomicOr, and rmm::exec_policy_nosync is used where the stream is already tracked.
  • Java/JNIBloomFilter.java gains VERSION_1, VERSION_2, DEFAULT_SEED constants and a new create(version, numHashes, bloomFilterBits, seed) method; the old two-arg create is deprecated but retained for backward compatibility.
  • Note: The PR description mentions make_pair_iterator as a second deliverable, but this function is not present in the diff. Only make_counting_transform_iterator is introduced.

Confidence Score: 5/5

  • Safe to merge; all remaining findings are P2 style/consistency suggestions with no correctness or data-integrity impact.
  • The iterator replacement is mechanically correct and well-tested. The V2 bloom filter implementation matches the Spark reference algorithm faithfully (64-bit hash, seed, same bit-swizzle). Atomics were upgraded to cuda::atomic_ref, const-correctness was improved, and both Java and C++ APIs are backward compatible. Three P2 issues remain: a signed-unsigned comparison without explicit cast in bloom_filter_probe, a JNI upper-bound that is slightly looser than the C++ guard, and a minor static-assert restriction in the new iterator header that excludes uint32_t start values. None of these cause incorrect behavior at any input currently reachable in production.
  • src/main/cpp/src/bloom_filter.cu (signed-unsigned cast on line ~696), src/main/cpp/src/BloomFilterJni.cpp (JNI bound vs C++ bound mismatch), src/main/cpp/src/utilities/iterator.cuh (static_assert excludes uint32_t)

Important Files Changed

Filename Overview
src/main/cpp/src/utilities/iterator.cuh New utility header introducing spark_rapids_jni::util::make_counting_transform_iterator, replacing cudf::detail::make_counting_transform_iterator across the codebase. Uses cuda::counting_iterator + thrust::make_transform_iterator. Static-assert restricts CountingIterType to ≤ 32-bit signed width (≤ int32_t digits), which excludes uint32_t — consistent with the cudf convention it replaces.
src/main/cpp/src/bloom_filter.cu Major refactor adding V2 bloom filter support alongside V1. Introduces version-templated gpu_bloom_filter_put and bloom_probe_functor, 64-bit hash indexing for V2, configurable seeds, cuda::atomic_ref::fetch_or for atomics, and rmm::exec_policy_nosync. Minor inconsistency: bloom_filter.size() (int32_t) compared to size_t without explicit cast in bloom_filter_probe (unlike the analogous checks in bloom_filter_put and bloom_filter_merge).
src/main/cpp/src/bloom_filter.hpp Adds explicit V1/V2 header structs with static_assert size checks, version constants, a bloom_filter_header_size_for_version helper, and updates bloom_filter_create signature to include version and seed.
src/main/cpp/src/BloomFilterJni.cpp JNI bridge updated to pass version and seed to the C++ API. Adds input validation via JNI_ARG_CHECK capping bloomFilterBits at INT32_MAX * 64; however, get_bloom_filter_stride in C++ caps buf_size at INT32_MAX, making the effective maximum bloom_filter_longs ~268 M (not INT32_MAX). Inputs above this limit will throw a CUDF_EXPECTS rather than the JNI argument-check error.
src/main/java/com/nvidia/spark/rapids/jni/BloomFilter.java Public API updated with VERSION_1, VERSION_2, DEFAULT_SEED constants and a new create(version, numHashes, bloomFilterBits, seed) overload. Old create(numHashes, bloomFilterBits) is deprecated but kept for backward compatibility, delegating to V1/seed-0.
src/main/cpp/src/row_conversion.cu Mechanical replacement of cudf::detail::make_counting_transform_iteratorspark_rapids_jni::util::make_counting_transform_iterator, and util:: arithmetic helpers → cudf::util:: (adding explicit namespace). No logic changes.
src/main/cpp/tests/bloom_filter.cu Existing V1 tests renamed with V1 suffix and updated for new API signatures. New V2 tests added covering initialization, build-and-probe, null handling, merge, absent-key probing, and seed variation.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["bloom_filter_create(version, num_hashes, longs, seed)"] --> B{version?}
    B -->|V1| C["pack bloom_filter_header_v1\n(12 bytes: version, num_hashes, num_longs)"]
    B -->|V2| D["pack bloom_filter_header_v2\n(16 bytes: version, num_hashes, seed, num_longs)"]
    C --> E["memset bit array to 0"]
    D --> E

    E --> F["bloom_filter_put(filter, input)"]
    F --> G["unpack_bloom_filter()\nreturns header, buffer, bits, seed"]
    G --> H{version?}
    H -->|V1| I["gpu_bloom_filter_put<1>\nh1*1 + idx*h2 mod bits\nseed=0, loop 1..num_hashes"]
    H -->|V2| J["gpu_bloom_filter_put<2>\nh1*INT32_MAX + h2 (64-bit)\nseed from header, loop 0..num_hashes-1"]
    I --> K["cuda::atomic_ref::fetch_or\ngpu_bit_to_word_mask with big-endian swizzle"]
    J --> K

    K --> L["bloom_filter_probe(input, filter)"]
    L --> M{version?}
    M -->|V1| N["bloom_probe_functor<1>\n32-bit hash, early exit on miss"]
    M -->|V2| O["bloom_probe_functor<2>\n64-bit hash, early exit on miss"]
    N --> P["Output: bool column\n(true = possibly in set)"]
    O --> P

    L2["bloom_filter_merge(filters)"] --> G2["unpack first filter\nvalidate all filters match header"]
    G2 --> R["bitwise-OR all bit arrays\n(thrust::transform)"]
    R --> P2["New merged filter scalar"]
Loading

Reviews (6): Last reviewed commit: "Merge remote-tracking branch 'origin/rel..." | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@mythrocks mythrocks changed the title [WIP] Decouple from cudf::detail::make_counting_transform_iterator Decouple from cudf::detail::make_counting_transform_iterator Feb 24, 2026
@mythrocks
Copy link
Copy Markdown
Collaborator Author

Build

Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Signed-off-by: MithunR <mithunr@nvidia.com>
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@mythrocks
Copy link
Copy Markdown
Collaborator Author

Build

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@mythrocks
Copy link
Copy Markdown
Collaborator Author

Build

@sameerz sameerz requested a review from a team February 28, 2026 00:15
Copy link
Copy Markdown
Collaborator

@ttnghia ttnghia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please hold off a little bit. We need to discuss on mitigating the issue with code duplicates and unavoidable dependency from cudf detail namespace.

@mythrocks
Copy link
Copy Markdown
Collaborator Author

I think make_counting_transform_iterator is alright to have our own version of.

I think the pair-wise iterator should probably remain in CUDF. I'll check whether libcudf is accepting of exposing those utilities.

@nvauto
Copy link
Copy Markdown
Collaborator

nvauto commented Mar 16, 2026

NOTE: release/26.04 has been created from main. Please retarget your PR to release/26.04 if it should be included in the release.

These are a bridge too far.  Pair-wise iterators can be consumed from cudf, eventually.

Signed-off-by: MithunR <mithunr@nvidia.com>
@mythrocks mythrocks changed the base branch from main to release/26.04 March 30, 2026 22:52
@mythrocks
Copy link
Copy Markdown
Collaborator Author

Build

@mythrocks
Copy link
Copy Markdown
Collaborator Author

This PR has been reduced in scope, to remove the pair-wise iterators changes. Those can be consumed from rapidsai/cudf, once they're made available for public consumption.

@mythrocks
Copy link
Copy Markdown
Collaborator Author

@ttnghia: Any objection to my merging this change?

@ttnghia
Copy link
Copy Markdown
Collaborator

ttnghia commented Mar 31, 2026

@ttnghia: Any objection to my merging this change?

Now with AI assisted coding, I'm less likely reject code duplication although that is never a good way to me 😃

@mythrocks mythrocks merged commit 995224e into NVIDIA:release/26.04 Mar 31, 2026
2 checks passed
@mythrocks
Copy link
Copy Markdown
Collaborator Author

Thank you, chaps. This change has been merged.

On to the next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants