Add Gaussian Mixture Models (GMM)#2248
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR adds a complete Gaussian Mixture Model (GMM) implementation to cuvs, introducing GPU-accelerated EM training and inference across the C++ layer (CUDA kernels, EM driver, public API), C bindings (DLPack-based API), Python/Cython bindings, and Fern documentation pages for C, C++, and Python APIs. ChangesGMM Clustering Feature
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@c/tests/cluster/gmm_c.cu`:
- Around line 45-64: The cuvsResources_t resource created by
cuvsResourcesCreate(&res) uses a different CUDA stream by default than the
stream used for the asynchronous raft::copy operations, creating a race
condition where cuvsGMMFit executes before the data copy completes. After
calling cuvsResourcesCreate(&res), immediately call cuvsStreamSet(res, stream)
to synchronize the resource's stream with the stream variable that is used by
raft::copy and the subsequent GPU operations. Apply this fix immediately
following the cuvsResourcesCreate call, and also apply the same fix pattern to
the other test sections mentioned in the comment (lines 98-111, 167-187,
216-229).
In `@cpp/src/cluster/gmm_impl.cuh`:
- Around line 1115-1157: The FULL-covariance second pass has inconsistent
responsibilities across its computational steps. The issue is that after
m_finalize mutates weights and means at line 1141, the subsequent estep_tile
call at line 1145 recomputes responsibilities with these updated parameters, but
then m_cov_full_pass uses these new responsibilities which differ from those
used in the first pass and in m_accumulate, breaking the mathematical
consistency of the EM algorithm. Fix this by saving the responsibilities from
the first pass before m_finalize is called, and then reusing those saved
responsibilities in the m_cov_full_pass loop instead of recomputing them with
estep_tile.
In `@python/cuvs/cuvs/cluster/gmm/gmm.pyx`:
- Around line 90-94: The `__cinit__` method in the GMM class does not check the
return value from `cuvsGMMParamsCreate`, which can leave `self.params`
uninitialized if the call fails, leading to undefined behavior. Add error
checking after the `cuvsGMMParamsCreate` call to verify it succeeded and raise
an exception if it fails. Additionally, the `__dealloc__` method currently calls
`check_cuvs()` on `cuvsGMMParamsDestroy`, which can raise exceptions in a
destructor (problematic behavior). Instead, check the return value from
`cuvsGMMParamsDestroy(self.params)` without raising exceptions - only log or
handle the error silently to avoid suppressed exceptions or ResourceWarnings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ca5bb572-978f-46af-87e8-00b27bd7a8a1
📒 Files selected for processing (32)
c/CMakeLists.txtc/include/cuvs/cluster/gmm.hc/include/cuvs/core/all.hc/src/cluster/gmm.cppc/tests/CMakeLists.txtc/tests/cluster/gmm_c.cucpp/CMakeLists.txtcpp/include/cuvs/cluster/gmm.hppcpp/src/cluster/gmm.cuhcpp/src/cluster/gmm_double.cucpp/src/cluster/gmm_float.cucpp/src/cluster/gmm_impl.cuhcpp/src/cluster/gmm_kernels.cuhcpp/tests/CMakeLists.txtcpp/tests/cluster/gmm.cufern/docs.ymlfern/pages/c_api/c-api-cluster-gmm.mdfern/pages/c_api/index.mdfern/pages/cluster/gmm.mdfern/pages/cluster/index.mdfern/pages/cpp_api/cpp-api-cluster-gmm.mdfern/pages/cpp_api/index.mdfern/pages/python_api/index.mdfern/pages/python_api/python-api-cluster-gmm.mdpython/cuvs/cuvs/cluster/CMakeLists.txtpython/cuvs/cuvs/cluster/__init__.pypython/cuvs/cuvs/cluster/gmm/CMakeLists.txtpython/cuvs/cuvs/cluster/gmm/__init__.pxdpython/cuvs/cuvs/cluster/gmm/__init__.pypython/cuvs/cuvs/cluster/gmm/gmm.pxdpython/cuvs/cuvs/cluster/gmm/gmm.pyxpython/cuvs/cuvs/tests/test_gmm.py
| auto stream = raft::resource::get_cuda_stream(handle); | ||
|
|
||
| int64_t cn = (int64_t)kNComponents * kNFeatures * kNFeatures; // FULL | ||
|
|
||
| rmm::device_uvector<float> dataset_d(kNSamples * kNFeatures, stream); | ||
| rmm::device_uvector<float> weights_d(kNComponents, stream); | ||
| rmm::device_uvector<float> means_d(kNComponents * kNFeatures, stream); | ||
| rmm::device_uvector<float> covs_d(cn, stream); | ||
| rmm::device_uvector<float> pchol_d(cn, stream); | ||
| rmm::device_uvector<float> precs_d(cn, stream); | ||
| rmm::device_uvector<int32_t> labels_d(kNSamples, stream); | ||
| rmm::device_uvector<int32_t> labels2_d(kNSamples, stream); | ||
| rmm::device_uvector<float> resp_d(kNSamples * kNComponents, stream); | ||
| rmm::device_uvector<float> logp_d(kNSamples, stream); | ||
|
|
||
| raft::copy( | ||
| dataset_d.data(), reinterpret_cast<float const*>(kDataset), kNSamples * kNFeatures, stream); | ||
|
|
||
| cuvsResources_t res; | ||
| ASSERT_EQ(cuvsResourcesCreate(&res), CUVS_SUCCESS); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify how C resources choose/own CUDA streams.
rg -n -C2 "cuvsResourcesCreate|cuvsResourcesDestroy|set.*stream|get.*stream" c/src c/include cpp/src cpp/include
# Confirm async staging points in this test.
rg -n -C2 "raft::copy\\(|cuvsGMMFit\\(" c/tests/cluster/gmm_c.cuRepository: rapidsai/cuvs
Length of output: 50370
🏁 Script executed:
# Check the exact test file context around the flagged lines
# Lines 45-64
sed -n '40,70p' c/tests/cluster/gmm_c.cu
# Check lines 98-111
sed -n '95,115p' c/tests/cluster/gmm_c.cu
# Check for any stream synchronization or stream setting calls in the test
rg -n "cuvsResourcesSetCudaStream|cudaStreamSynchronize|stream.*Synchronize" c/tests/cluster/gmm_c.cu
# Check if the default stream or a specific stream is used after cuvsResourcesCreate
rg -n -A5 "cuvsResourcesCreate" c/tests/cluster/gmm_c.cuRepository: rapidsai/cuvs
Length of output: 2511
🏁 Script executed:
# Check if cuvsResourcesSetCudaStream is ever called in gmm_c.cu
rg -n "cuvsResourcesSetCudaStream" c/tests/cluster/gmm_c.cu
# Get the full file to see all usages of res after creation
wc -l c/tests/cluster/gmm_c.cu
# Look at more context around cuvsResourcesCreate and what happens after
rg -n -B2 -A10 "cuvsResourcesCreate" c/tests/cluster/gmm_c.cu | head -100Repository: rapidsai/cuvs
Length of output: 1165
🏁 Script executed:
# Get the implementation of cuvsResourcesCreate to see what stream it uses
sed -n '30,50p' c/src/core/c_api.cpp
# Also check for cuvsGMMFit to see which stream it uses
rg -n -A5 "cuvsGMMFit.*{" c/src/clustering/gmm.cpp | head -50Repository: rapidsai/cuvs
Length of output: 826
🏁 Script executed:
# Find the cuvsGMMFit function definition
rg -n "cuvsGMMFit" c/src/clustering/gmm.cpp | head -20
# Check the actual GMM fit implementation
fd "gmm.cpp" c/src/clustering/ -x cat {} | head -100Repository: rapidsai/cuvs
Length of output: 227
🏁 Script executed:
# Find gmm-related source files
find c/src -name "*gmm*" -type f
# Find the gmm.c or gmm implementation in c/src
fd "gmm" c/srcRepository: rapidsai/cuvs
Length of output: 101
🏁 Script executed:
# Find cuvsGMMFit in the gmm.cpp file
rg -n "cuvsGMMFit" c/src/cluster/gmm.cpp
# Get the implementation of cuvsGMMFit
rg -n -A30 "extern \"C\" cuvsError_t cuvsGMMFit" c/src/cluster/gmm.cppRepository: rapidsai/cuvs
Length of output: 1575
🏁 Script executed:
# Find the _fit function implementation
rg -n "void _fit" c/src/cluster/gmm.cpp
# Get the _fit implementation with context
rg -n -A50 "template.*_fit\(" c/src/cluster/gmm.cpp | head -80Repository: rapidsai/cuvs
Length of output: 91
🏁 Script executed:
# Get the _fit implementation starting from line 78
sed -n '78,130p' c/src/cluster/gmm.cppRepository: rapidsai/cuvs
Length of output: 2353
🏁 Script executed:
# Check what stream raft::resources uses by default
rg -n "raft::resources" c/include -A5 | head -30
# Also check if there's any synchronization guarantee in resource creation
rg -n "get_cuda_stream" c/include/cuvs/core/c_api.h -B3 -A3Repository: rapidsai/cuvs
Length of output: 1179
🏁 Script executed:
# Find the stream management functions for resources
rg -n "cuvsResourcesSetCudaStream|cuvsResourcesGetCudaStream" c/include/cuvs/core/c_api.h -B2 -A5
# Check the raft::resources default behavior
rg -n "class resources" cpp/include -l 2>/dev/null | head -5 | xargs -I {} head -100 {}
# Look at the actual default stream that raft::resources uses
rg -n "class resources" --type cpp -A30 | grep -A30 "resources()" | head -50Repository: rapidsai/cuvs
Length of output: 39
🏁 Script executed:
# Get the actual content of c/include/cuvs/core/c_api.h for stream functions
sed -n '1,150p' c/include/cuvs/core/c_api.h | tail -60
# Check if cuvsResourcesSetCudaStream exists
rg -n "SetCudaStream\|GetCudaStream" c/include/cuvs/core/c_api.hRepository: rapidsai/cuvs
Length of output: 2030
🏁 Script executed:
# Check if cuvsStreamSet is called in the test file
rg -n "cuvsStreamSet" c/tests/cluster/gmm_c.cu
# Check the full test function to see the complete flow
sed -n '42,120p' c/tests/cluster/gmm_c.cuRepository: rapidsai/cuvs
Length of output: 3400
🏁 Script executed:
# Check if other tests use cuvsStreamSet
rg -n "cuvsStreamSet" c/tests/ | head -20
# Check if there's a pattern in other C API tests
fd "gmm_c.cu|*_c.cu" c/tests/ --type f | head -5 | xargs -I {} grep -l "cuvsStreamSet" {}Repository: rapidsai/cuvs
Length of output: 528
🏁 Script executed:
# Compare the two files - check ann_ivf_sq_c.cu to see the pattern
sed -n '105,120p' c/tests/neighbors/ann_ivf_sq_c.cu
# And check the core c_api test
sed -n '15,35p' c/tests/core/c_api.c
# Verify: does gmm_c.cu have any synchronization or stream setting anywhere?
cat c/tests/cluster/gmm_c.cu | grep -n "cuvsStream\|cudaStream"Repository: rapidsai/cuvs
Length of output: 1125
Set CUDA stream on resources after creation to match async copy operations
raft::copy(..., stream) is asynchronous, but the cuvsResources_t res created with cuvsResourcesCreate(&res) uses a different stream by default. This creates a race where cuvsGMMFit may execute before the data copy completes. Call cuvsStreamSet(res, stream) immediately after resource creation, as shown in other C API tests (e.g., ann_ivf_sq_c.cu).
Suggested fix
raft::copy(
dataset_d.data(), reinterpret_cast<float const*>(kDataset), kNSamples * kNFeatures, stream);
cuvsResources_t res;
ASSERT_EQ(cuvsResourcesCreate(&res), CUVS_SUCCESS);
+ ASSERT_EQ(cuvsStreamSet(res, stream), CUVS_SUCCESS);Also applies to: 98-111, 167-187, 216-229
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@c/tests/cluster/gmm_c.cu` around lines 45 - 64, The cuvsResources_t resource
created by cuvsResourcesCreate(&res) uses a different CUDA stream by default
than the stream used for the asynchronous raft::copy operations, creating a race
condition where cuvsGMMFit executes before the data copy completes. After
calling cuvsResourcesCreate(&res), immediately call cuvsStreamSet(res, stream)
to synchronize the resource's stream with the stream variable that is used by
raft::copy and the subsequent GPU operations. Apply this fix immediately
following the cuvsResourcesCreate call, and also apply the same fix pattern to
the other test sections mentioned in the comment (lines 98-111, 167-187,
216-229).
Source: Coding guidelines
| auto estep_tile = [&](int t0, int nt) { | ||
| e_step<T>(handle, | ||
| params, | ||
| X + (size_t)t0 * d, | ||
| nt, | ||
| d, | ||
| K, | ||
| weights, | ||
| means, | ||
| precisions_chol, | ||
| log_det.data(), | ||
| resp.data(), | ||
| resp.data(), | ||
| lpn.data()); | ||
| }; | ||
| auto em_step = [&](bool do_mstep) -> double { | ||
| double lb_sum = 0.0; | ||
| for (int t0 = 0; t0 < n; t0 += tile) { | ||
| int nt = std::min(tile, n - t0); | ||
| estep_tile(t0, nt); | ||
| if (do_mstep) | ||
| m_accumulate<T>( | ||
| handle, params, X + (size_t)t0 * d, nt, d, K, resp.data(), ws, (t0 == 0) ? T(0) : T(1)); | ||
| lb_sum += (double)mean_device<T>(handle, lpn.data(), nt) * nt; | ||
| } | ||
| if (do_mstep) { | ||
| m_finalize<T>(handle, params, n, d, K, ws, weights, means, covariances); | ||
| if (ct == covariance_type::FULL) { | ||
| for (int t0 = 0; t0 < n; t0 += tile) { | ||
| int nt = std::min(tile, n - t0); | ||
| estep_tile(t0, nt); | ||
| m_cov_full_pass<T>(handle, | ||
| X + (size_t)t0 * d, | ||
| nt, | ||
| d, | ||
| K, | ||
| resp.data(), | ||
| means, | ||
| covariances, | ||
| ws, | ||
| (t0 == 0) ? T(0) : T(1)); | ||
| } | ||
| m_finalize_cov_full<T>(handle, params, d, K, ws, covariances); |
There was a problem hiding this comment.
CRITICAL: FULL-covariance second pass uses inconsistent responsibilities across substeps
At Line 1141 m_finalize mutates weights/means before Line 1145 reruns estep_tile, so m_cov_full_pass is fed different responsibilities than pass-1 N_k/num. Also, for init_method::Random, Line 1188 regenerates random responsibilities before Line 1211, so FULL init pass-2 no longer matches pass-1. This can corrupt EM updates and break lower-bound monotonicity.
Suggested fix direction
- auto estep_tile = [&](int t0, int nt) {
+ auto estep_tile = [&](int t0, int nt, const T* estep_weights, const T* estep_means) {
e_step<T>(handle,
params,
X + (size_t)t0 * d,
nt,
d,
K,
- weights,
- means,
+ estep_weights,
+ estep_means,
precisions_chol,
log_det.data(),
resp.data(),
resp.data(),
lpn.data());
};
auto em_step = [&](bool do_mstep) -> double {
+ rmm::device_uvector<T> estep_w_cache(0, stream), estep_m_cache(0, stream);
...
- estep_tile(t0, nt);
+ estep_tile(t0, nt, weights, means);
...
if (do_mstep) {
+ if (ct == covariance_type::FULL) {
+ estep_w_cache.resize(K, stream);
+ estep_m_cache.resize((size_t)K * d, stream);
+ raft::copy(estep_w_cache.data(), weights, K, stream);
+ raft::copy(estep_m_cache.data(), means, (size_t)K * d, stream);
+ }
m_finalize<T>(...);
if (ct == covariance_type::FULL) {
for (...) {
- estep_tile(t0, nt);
+ estep_tile(t0, nt, estep_w_cache.data(), estep_m_cache.data());
m_cov_full_pass<T>(...);
}
}
}
};- if (im == init_method::Random) {
- raft::random::uniform(handle, rng, resp.data(), (size_t)nt * K, T(0), T(1));
+ if (im == init_method::Random) {
+ // deterministic per-(init,tile) so FULL pass-1/pass-2 use identical responsibilities
+ raft::random::RngState tile_rng(
+ init_seed ^ (static_cast<uint64_t>(t0) * 0x9E3779B97F4A7C15ULL),
+ raft::random::GeneratorType::GenPhilox);
+ raft::random::uniform(handle, tile_rng, resp.data(), (size_t)nt * K, T(0), T(1));
detail::normalize_rows_kernel<T><<<...>>>(resp.data(), nt, K);
}Also applies to: 1186-1212
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/src/cluster/gmm_impl.cuh` around lines 1115 - 1157, The FULL-covariance
second pass has inconsistent responsibilities across its computational steps.
The issue is that after m_finalize mutates weights and means at line 1141, the
subsequent estep_tile call at line 1145 recomputes responsibilities with these
updated parameters, but then m_cov_full_pass uses these new responsibilities
which differ from those used in the first pass and in m_accumulate, breaking the
mathematical consistency of the EM algorithm. Fix this by saving the
responsibilities from the first pass before m_finalize is called, and then
reusing those saved responsibilities in the m_cov_full_pass loop instead of
recomputing them with estep_tile.
| def __cinit__(self): | ||
| cuvsGMMParamsCreate(&self.params) | ||
|
|
||
| def __dealloc__(self): | ||
| check_cuvs(cuvsGMMParamsDestroy(self.params)) |
There was a problem hiding this comment.
CRITICAL: Unchecked error in __cinit__ and unsafe exception in __dealloc__
Two related issues in the lifecycle methods:
-
Line 91:
cuvsGMMParamsCreatereturn value is not checked. If allocation fails,self.paramsremains uninitialized (undefined behavior), and subsequent operations will silently corrupt memory or crash. -
Line 94:
check_cuvs()can raise exceptions in__dealloc__, which is problematic:- If
__cinit__failed silently,self.paramsmay be invalid/garbage - Raising exceptions in destructors can cause suppressed errors or ResourceWarnings
- If
Proposed fix
def __cinit__(self):
- cuvsGMMParamsCreate(&self.params)
+ check_cuvs(cuvsGMMParamsCreate(&self.params))
def __dealloc__(self):
- check_cuvs(cuvsGMMParamsDestroy(self.params))
+ if self.params is not NULL:
+ cuvsGMMParamsDestroy(self.params)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/cuvs/cuvs/cluster/gmm/gmm.pyx` around lines 90 - 94, The `__cinit__`
method in the GMM class does not check the return value from
`cuvsGMMParamsCreate`, which can leave `self.params` uninitialized if the call
fails, leading to undefined behavior. Add error checking after the
`cuvsGMMParamsCreate` call to verify it succeeded and raise an exception if it
fails. Additionally, the `__dealloc__` method currently calls `check_cuvs()` on
`cuvsGMMParamsDestroy`, which can raise exceptions in a destructor (problematic
behavior). Instead, check the return value from
`cuvsGMMParamsDestroy(self.params)` without raising exceptions - only log or
handle the error silently to avoid suppressed exceptions or ResourceWarnings.
This adds GMM to cuvs. This a pure fp32/fp64 cuda-core version. It already implements some of the memory advance from Flash-GMM.