Decouple from cudf::detail::make_counting_transform_iterator#4306
Decouple from cudf::detail::make_counting_transform_iterator#4306mythrocks merged 8 commits intoNVIDIA:release/26.04from
cudf::detail::make_counting_transform_iterator#4306Conversation
This change introduces a version of `make_counting_transform_iterator` that is specific to Spark RAPIDS JNI. The previous version of this function is from `cudf::detail`, which is now deemed private to cuDF. This commit should allow Spark RAPIDS JNI to be insulated from changes to interfaces in `cudf::detail`. Note that this version does not use `thrust::transform_iterator`. It banks instead on `cuda::make_transform_iterator` instead. Signed-off-by: MithunR <mithunr@nvidia.com>
cudf::detail::make_counting_transform_iteratorcudf::detail::make_counting_transform_iterator
Greptile SummaryThis PR achieves two goals: it decouples Spark RAPIDS JNI from Key changes:
Confidence Score: 5/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["bloom_filter_create(version, num_hashes, longs, seed)"] --> B{version?}
B -->|V1| C["pack bloom_filter_header_v1\n(12 bytes: version, num_hashes, num_longs)"]
B -->|V2| D["pack bloom_filter_header_v2\n(16 bytes: version, num_hashes, seed, num_longs)"]
C --> E["memset bit array to 0"]
D --> E
E --> F["bloom_filter_put(filter, input)"]
F --> G["unpack_bloom_filter()\nreturns header, buffer, bits, seed"]
G --> H{version?}
H -->|V1| I["gpu_bloom_filter_put<1>\nh1*1 + idx*h2 mod bits\nseed=0, loop 1..num_hashes"]
H -->|V2| J["gpu_bloom_filter_put<2>\nh1*INT32_MAX + h2 (64-bit)\nseed from header, loop 0..num_hashes-1"]
I --> K["cuda::atomic_ref::fetch_or\ngpu_bit_to_word_mask with big-endian swizzle"]
J --> K
K --> L["bloom_filter_probe(input, filter)"]
L --> M{version?}
M -->|V1| N["bloom_probe_functor<1>\n32-bit hash, early exit on miss"]
M -->|V2| O["bloom_probe_functor<2>\n64-bit hash, early exit on miss"]
N --> P["Output: bool column\n(true = possibly in set)"]
O --> P
L2["bloom_filter_merge(filters)"] --> G2["unpack first filter\nvalidate all filters match header"]
G2 --> R["bitwise-OR all bit arrays\n(thrust::transform)"]
R --> P2["New merged filter scalar"]
Reviews (6): Last reviewed commit: "Merge remote-tracking branch 'origin/rel..." | Re-trigger Greptile |
cudf::detail::make_counting_transform_iteratorcudf::detail::make_counting_transform_iterator
|
Build |
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
|
Build |
|
Build |
ttnghia
left a comment
There was a problem hiding this comment.
Please hold off a little bit. We need to discuss on mitigating the issue with code duplicates and unavoidable dependency from cudf detail namespace.
|
I think I think the pair-wise iterator should probably remain in CUDF. I'll check whether |
|
NOTE: release/26.04 has been created from main. Please retarget your PR to release/26.04 if it should be included in the release. |
These are a bridge too far. Pair-wise iterators can be consumed from cudf, eventually. Signed-off-by: MithunR <mithunr@nvidia.com>
|
Build |
|
This PR has been reduced in scope, to remove the |
|
@ttnghia: Any objection to my merging this change? |
Now with AI assisted coding, I'm less likely reject code duplication although that is never a good way to me 😃 |
|
Thank you, chaps. This change has been merged. On to the next. |
This commit introduces utility iterators to be used in place
cudf::detailiterators. This is to further reduce dependencies oncudf::detailAPIs that are now deemed private to the CUDF project.make_counting_transform_iteratorThis change introduces a version of
make_counting_transform_iteratorthat is specific to Spark RAPIDS JNI.The previous version of this function is from
cudf::detail, which is now deemed private to cuDF. This commit should allow Spark RAPIDS JNI to be insulated from changes to interfaces incudf::detail.Note that this version does not use
thrust::transform_iterator. It banks instead oncuda::make_transform_iteratorinstead.