Skip to content
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
ac2540a
WIP: BloomFilter v2 support.
mythrocks Mar 10, 2026
7af89f7
Code formatting.
mythrocks Mar 11, 2026
bface5f
Removed redundant CUDF_EXPECTS.
mythrocks Mar 12, 2026
aa9b576
Review fixes:
mythrocks Mar 12, 2026
8c2c101
Java tests for V2 format.
mythrocks Mar 12, 2026
2e63ec8
Better overflow checking.
mythrocks Mar 13, 2026
d83b2c9
Formatting.
mythrocks Mar 13, 2026
095dd88
Review: Better int width checks.
mythrocks Mar 13, 2026
be2a120
Better overflow checking. Documented impedance mismatch.
mythrocks Mar 13, 2026
bd90bf4
Corrected truncation. More error checks.
mythrocks Mar 14, 2026
2037b0e
Check that bloom filter bit count is positive.
mythrocks Mar 14, 2026
10ba6f0
Merge remote-tracking branch 'origin/release/26.04' into bloom-filter…
mythrocks Mar 16, 2026
3e74747
Review: Consolidated commonality.
mythrocks Mar 19, 2026
edc63a9
Review: Removed unused parameter name.
mythrocks Mar 19, 2026
6292048
Review: cuda::std::.
mythrocks Mar 19, 2026
a773edd
Review: CUDA_TRY.
mythrocks Mar 19, 2026
110fc0e
Review: exec_policy_nosync.
mythrocks Mar 19, 2026
d48c048
Review: atomic_refs.
mythrocks Mar 19, 2026
5eb90ea
Review: Test for all absent.
mythrocks Mar 19, 2026
dc224c9
Review: Removed tautological check.
mythrocks Mar 19, 2026
5d7d5e1
Review: Fixed range checks for bit counts.
mythrocks Mar 20, 2026
27e3388
Review: Use stream, mr. Fix stale comment.
mythrocks Mar 21, 2026
317e800
Review: Reduce const-casts.
mythrocks Mar 21, 2026
81893fb
Review: Protect against empty bloom filter bits.
mythrocks Mar 21, 2026
4995300
Review: include <cudf/utilities/span.hpp>.
mythrocks Mar 21, 2026
090714c
Review: static_asserts for struct sizes.
mythrocks Mar 21, 2026
92db59f
Review: Added test BuildAndProbeWithNullsV2.
mythrocks Mar 21, 2026
5f8d711
Review: SRJ_FUNC_RANGE.
mythrocks Mar 21, 2026
cd79867
Review: Left todo for pinned memory use.
mythrocks Mar 21, 2026
8480b6f
Review: exec_policy_nosync, test changes.
mythrocks Mar 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 45 additions & 6 deletions src/main/cpp/benchmarks/bloom_filter.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2023-2024, NVIDIA CORPORATION.
* Copyright (c) 2023-2026, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -22,17 +22,51 @@
#include <hash/hash.hpp>
#include <nvbench/nvbench.cuh>

static void bloom_filter_put(nvbench::state& state)
static void bloom_filter_put_v1(nvbench::state& state)
{
constexpr int num_rows = 150'000'000;
constexpr int num_hashes = 3;

cudf::size_type const bloom_filter_bytes = state.get_int64("bloom_filter_bytes");
cudf::size_type const bloom_filter_longs = bloom_filter_bytes / sizeof(int64_t);
auto bloom_filter = spark_rapids_jni::bloom_filter_create(
spark_rapids_jni::bloom_filter_version_1, num_hashes, bloom_filter_longs);

data_profile_builder builder;
builder.no_validity();
auto const src = create_random_table({{cudf::type_id::INT64}}, row_count{num_rows}, builder);
auto const input = spark_rapids_jni::xxhash64(*src);

auto const stream = cudf::get_default_stream();
state.set_cuda_stream(nvbench::make_cuda_stream_view(stream.value()));
state.exec(nvbench::exec_tag::timer | nvbench::exec_tag::sync,
[&](nvbench::launch& launch, auto& timer) {
timer.start();
spark_rapids_jni::bloom_filter_put(*bloom_filter, *input);
stream.synchronize();
timer.stop();
});

size_t const bytes_read = num_rows * sizeof(int64_t);
size_t const bytes_written = num_rows * sizeof(cudf::bitmask_type) * num_hashes;
auto const time = state.get_summary("nv/cold/time/gpu/mean").get_float64("value");
state.add_element_count(std::size_t{num_rows}, "Rows Inserted");
state.add_global_memory_reads(bytes_read, "Bytes read");
state.add_global_memory_writes(bytes_written, "Bytes written");
state.add_element_count(static_cast<double>(bytes_written) / time, "Write bytes/sec");
}

static void bloom_filter_put_v2(nvbench::state& state)
{
constexpr int num_rows = 150'000'000;
constexpr int num_hashes = 3;

// create the bloom filter
cudf::size_type const bloom_filter_bytes = state.get_int64("bloom_filter_bytes");
cudf::size_type const bloom_filter_longs = bloom_filter_bytes / sizeof(int64_t);
auto bloom_filter = spark_rapids_jni::bloom_filter_create(num_hashes, bloom_filter_longs);
auto bloom_filter = spark_rapids_jni::bloom_filter_create(
spark_rapids_jni::bloom_filter_version_2, num_hashes, bloom_filter_longs);

// create a column of hashed values
data_profile_builder builder;
builder.no_validity();
auto const src = create_random_table({{cudf::type_id::INT64}}, row_count{num_rows}, builder);
Expand All @@ -57,7 +91,12 @@ static void bloom_filter_put(nvbench::state& state)
state.add_element_count(static_cast<double>(bytes_written) / time, "Write bytes/sec");
}

NVBENCH_BENCH(bloom_filter_put)
.set_name("Bloom Filter Put")
NVBENCH_BENCH(bloom_filter_put_v1)
.set_name("Bloom Filter Put V1")
.add_int64_axis("bloom_filter_bytes",
{512 * 1024, 1024 * 1024, 2 * 1024 * 1024, 4 * 1024 * 1024, 8 * 1024 * 1024});

NVBENCH_BENCH(bloom_filter_put_v2)
.set_name("Bloom Filter Put V2")
.add_int64_axis("bloom_filter_bytes",
{512 * 1024, 1024 * 1024, 2 * 1024 * 1024, 4 * 1024 * 1024, 8 * 1024 * 1024});
23 changes: 19 additions & 4 deletions src/main/cpp/src/BloomFilterJni.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2023-2025, NVIDIA CORPORATION.
* Copyright (c) 2023-2026, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -20,17 +20,32 @@
#include "jni_utils.hpp"
#include "utilities.hpp"

#include <limits>

extern "C" {

JNIEXPORT jlong JNICALL Java_com_nvidia_spark_rapids_jni_BloomFilter_creategpu(
JNIEnv* env, jclass, jint numHashes, jlong bloomFilterBits)
JNIEnv* env, jclass, jint version, jint numHashes, jlong bloomFilterBits, jint seed)
{
JNI_TRY
{
cudf::jni::auto_set_device(env);

int bloom_filter_longs = static_cast<int>((bloomFilterBits + 63) / 64);
auto bloom_filter = spark_rapids_jni::bloom_filter_create(numHashes, bloom_filter_longs);
// TODO (future): There is an impedance mismatch between the C++ and Java APIs.
// This seems to have been introduced in https://github.com/NVIDIA/spark-rapids-jni/pull/1303.
// The Java API accepts a long for the bloom filter bit count, but the C++ API accepts an int.
// This means that the Java API can represent a bloom filter bit count that is too large to
// be represented as an int in the C++ API.
// We should fix this by changing the C++ API to accept a long for the bloom filter bit count.
// We will address this in a future PR. For now, we add error checking to avoid overflow.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we file a github issue for this and link it here in the comment?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mythrocks is there an issue number to resolve this thread?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not yet. I am analyzing this again, to confirm that there is indeed a problem for followup. Once I have convinced myself (again) that the impedance mismatch is a concern, I will update this comment with an issue number.

Copy link
Copy Markdown
Collaborator Author

@mythrocks mythrocks Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, I think I have this analyzed incorrectly. I will confirm shortly.
Edit: Yep, I have this wrong. I'll fix the code and update the comment.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little confusing; it will need documenting in the code.
Here is where I was wrong:

The Java API (exposed to Spark) talks in terms of the number of bits in the bloom filter. This is a 64 bit value.
The C++ API underneath deals in terms of the number of int64_t values required to represent the bloom filter bits. This is a 32 bit value.

Per the code in Apache Spark's BitArray class:

    long numWords = (long) Math.ceil(numBits / 64.0);
    if (numWords > Integer.MAX_VALUE) {
      throw new IllegalArgumentException("Can't allocate enough space for " + numBits + " bits");
    }
    return (int) numWords;

The number of longs can at max be Integer.MAX_VALUE, i.e. 32-bit max.
The number of bits == Integer.MAX_VALUE * 64, stored as a 64-bit value.

The APIs are not incongruent, but I'd better add more stringent checks on the boundary values.

I'm testing now.

Copy link
Copy Markdown
Collaborator Author

@mythrocks mythrocks Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amahussein, @jihoonson: I think I've fixed my misconception here. Please let me know if the comment isn't satisfactory.

JNI_ARG_CHECK(env,
bloomFilterBits >= 0 && bloomFilterBits <= std::numeric_limits<int>::max() - 63,
"bloom filter bit count overflows int when converted to longs",
0);
auto const bloom_filter_longs_long = (bloomFilterBits + 63) / 64;
auto const bloom_filter_longs = static_cast<int>(bloom_filter_longs_long);
auto bloom_filter =
spark_rapids_jni::bloom_filter_create(version, numHashes, bloom_filter_longs, seed);
return reinterpret_cast<jlong>(bloom_filter.release());
}
JNI_CATCH(env, 0);
Expand Down
Loading
Loading