Skip to content

Track tensor and index lifetimes in the Rust DLPack bindings#2245

Open
yan-zaretskiy wants to merge 2 commits into
rapidsai:mainfrom
yan-zaretskiy:rust-dlpack
Open

Track tensor and index lifetimes in the Rust DLPack bindings#2245
yan-zaretskiy wants to merge 2 commits into
rapidsai:mainfrom
yan-zaretskiy:rust-dlpack

Conversation

@yan-zaretskiy

Copy link
Copy Markdown
Contributor

The previous ManagedTensor type was a non-owning wrapper over the C's FFI DLManagedTensor type with no lifetime attached. Hence there was no mechanism to tie the tensor data and shape/stride metadata it referenced to their owners. Indexes that keep a non-owning view of their dataset (CAGRA, brute-force) could outlive that data. Here we replace it with lifetime-parameterized DLTensorView and DLTensorViewMut views. They are produced by the public IntoDlTensor and IntoDlTensorMut traits. Users are now expected to implement these traits on their tensor types, so that our API can accept them as input/output arguments.

The previous `ManagedTensor` type was a non-owning wrapper over the C's
FFI `DLManagedTensor` type with no lifetime attached. Hence there was no
mechanism to tie the tensor data and shape/stride metadata it referenced
to their owners. Indexes that keep a non-owning view of their dataset
(CAGRA, brute-force) could outlive that data. Here we replace it with
lifetime-parameterized `DLTensorView` and `DLTensorViewMut` views. They
are produced by the public `IntoDlTensor` and `IntoDlTensorMut` traits.
Users are now expected to implement these traits on their tensor types,
so that our API can accept them as input/output arguments.
@copy-pr-bot

copy-pr-bot Bot commented Jun 15, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@yan-zaretskiy yan-zaretskiy self-assigned this Jun 15, 2026
@yan-zaretskiy yan-zaretskiy added breaking Introduces a breaking change Rust improvement Improves an existing functionality labels Jun 15, 2026
@yan-zaretskiy

Copy link
Copy Markdown
Contributor Author

Since this is a fairly large change, I would like to explain my thought process to justify the proposed design. First of all, let's identify what we're trying to fix.

The original cuvs type for tensor data is simply pub struct ManagedTensor(ffi::DLManagedTensor). It is a thin wrapper around the C DLManagedTensor - the struct that holds the raw data, shape, and strides pointers. When it's built from a borrowed tensor (e.g. via From<&ndarray>) it holds those pointers with no lifetime attached, so nothing ties the ManagedTensor to whatever owns the data, which can be freed out from under it. Therefore, using this type is effectively unsafe, though it is not marked as such.

The second problem is in the index types. CAGRA and brute-force don't copy the dataset at build time, because the underlying C++ implementation keeps a non-owning view and reads the original vectors during search, so the dataset has to outlive the index. The old API doesn't model that.

The purpose of this PR is therefore to enable tracking two things: the lifetime of the tensor data, and the lifetime of the indexes that borrow it.

To fix the first issue, we add lifetimes to the tensor view types. Moreover, we make a distinction between mutable and immutable views: DLTensorView<'a> for read-only inputs like datasets and queries, and DLTensorViewMut<'a> for outputs the C API writes, like neighbors and distances.

Users convert their own tensor types into these views by implementing a pair of new public traits, IntoDlTensor<'a> and IntoDlTensorMut<'a>. Every entry point (build, search, etc) takes impl IntoDlTensor<'a> or impl IntoDlTensorMut<'a> rather than a concrete type, so cuvs ships no tensor type of its own and stays agnostic about where your data lives.

The path from DLTensorView<'a> into C requires some clarification. Here's how we define this type:

pub struct DLTensorView<'a> {
    data: *mut std::ffi::c_void,
    device: ffi::DLDevice,
    dtype: ffi::DLDataType,
    shape: TensorDims,
    strides: Option<TensorDims>,
    _marker: PhantomData<&'a ()>,
}

where TensorDims is a vector with SBO, and in practice is always stack allocated. This is done to avoid heap allocations when making FFI calls from Rust. The reason DLTensorView<'a> owns shape and strides is because the C type eventually reads them as *mut i64, so something must own the data that this *mut i64 points to. One problem is that this something cannot be the original tensor type, because there's no guarantee it uses i64 for shape/stride values. So we choose to store it inside DLTensorView<'a>. The second design decision is how we hand a DLTensorView<'a> to the C function, which ultimately wants a *mut DLManagedTensor whose shape/strides are *mut i64 pointing at those arrays.

The obvious choice is to store a DLManagedTensor right inside the view, with its shape/strides pointing at our own shape/strides fields, and hand out a pointer to it. But that makes DLTensorView a self-referential struct, since the embedded DLManagedTensor would point back into the same value. And because TensorDims keeps its data inline, moving the view moves that inline array, while the DLManagedTensor's *mut i64 keeps pointing at the old location. Hence this choice is unsound.

There is a known way to make the embedding sound: store shape/strides as Box<[i64]> instead. Then the arrays live on the heap, and moving the view only moves the Box handle. But this forces a heap allocation per view.

So rather than embed the C struct, we build it only at a point we're about to call into C:

pub(crate) struct ManagedTensorRef<'a> {
    inner: ffi::DLManagedTensor,
    _borrow: PhantomData<&'a ()>,
}
impl<'a> DLTensorView<'a> {
    // Fills `inner` with `shape: self.shape.as_ptr() as *mut _`, etc.
    pub(crate) fn to_c(&self) -> ManagedTensorRef<'_> { /* ... */ }
}

Here to_c takes &self, and the returned ManagedTensorRef<'_> borrows it, so its lifetime is tied to the view. The compiler now guarantees the view, with the inline shape/strides its pointers point into, cannot move or be dropped while the ManagedTensorRef is alive. Finally, we take the raw *mut DLManagedTensor out of the ManagedTensorRef for the FFI call.

As for the second fix, the index types that borrow their dataset now also get a lifetime. CAGRA and brute-force now return Index<'d>, where 'd is the dataset's lifetime, so the compiler forbids the index from outliving the data it reads. IVF-Flat, IVF-PQ and Vamana copy the dataset at build time, so they carry no lifetime.

One special case here is indexes loaded from disk. An index produced by deserialize doesn't borrow any caller-supplied dataset, hence there is nothing for 'd to tie to. We model that by returning Index<'static>. The 'static says the index borrows nothing, so it isn't constrained by any dataset's lifetime and can live as long as you like.

@yan-zaretskiy yan-zaretskiy marked this pull request as ready for review June 15, 2026 21:47
@yan-zaretskiy yan-zaretskiy requested a review from a team as a code owner June 15, 2026 21:47
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added a public, trait-based DLPack tensor conversion API (IntoDlTensor / IntoDlTensorMut) for more flexible input/output handling.
  • Refactor

    • Updated k-NN and ANN index APIs (CAGRA, Brute Force, IVF-Flat, IVF-PQ, Vamana) to accept the new DLPack conversion traits for building and searching.
    • Updated the DLPack module exports accordingly.
  • Bug Fixes

    • Improved error reporting for tensor conversion failures.
  • Documentation

    • Refreshed module docs and rewrote the CAGRA example to match the new tensor workflow.
  • Chores

    • Updated dependency versions.

Walkthrough

Replaces the owning ManagedTensor abstraction across the entire Rust cuVS crate with a new non-owning, lifetime-parameterized DLPack view API. New public traits IntoDlTensor/IntoDlTensorMut and view types DLTensorView/DLTensorViewMut replace ManagedTensor in all index build/search and algorithm (k-means, distance) signatures. A test-only DeviceTensor adapter and ndarray implementations are added, all index and algorithm APIs are updated uniformly, and the CAGRA example is rewritten to demonstrate user-defined GPU tensor integration.

Changes

DLPack View API Refactor and Consumer Migration

Layer / File(s) Summary
DLPack core: traits, views, error, crate exports, deps
rust/cuvs/src/dlpack.rs, rust/cuvs/src/error.rs, rust/cuvs/src/lib.rs, rust/cuvs/Cargo.toml
Rewrites dlpack.rs introducing IntoDlTensor<'a>, IntoDlTensorMut<'a>, DType, DLTensorView<'a>, DLTensorViewMut<'a>, DLPackError, and internal ManagedTensorRef; removes ManagedTensor entirely. Adds DLPack variant to Error with From impl. Promotes dlpack to pub mod, swaps ManagedTensor re-export for the new view/trait exports, adds test_utils under #[cfg(test)], and adds thiserror/tinyvec dependencies.
Test-only DeviceTensor adapter and ndarray IntoDlTensor impls
rust/cuvs/src/test_utils.rs
Adds DeviceTensor<T> backed by RMM device memory with zeros, from_host, copy_to_host, and Drop. Implements IntoDlTensor/IntoDlTensorMut for device references and for &ndarray::ArrayRef / &mut ndarray::ArrayRef (host CPU views), including stride elision for standard contiguous layouts.
CAGRA Index lifetime parameterization and API migration
rust/cuvs/src/cagra/index.rs, rust/cuvs/src/cagra/mod.rs
Refactors Index to Index<'d> with handle and PhantomData; updates build, search, search_with_filter, serialize, deserialize (returns Index<'static>), and Drop. Replaces all test tensor construction with DeviceTensor. Updates module docs.
Brute-force Index lifetime parameterization and API migration
rust/cuvs/src/brute_force.rs
Converts Index to Index<'d> with PhantomData; updates build to impl IntoDlTensor<'d> and search to generic IntoDlTensor/IntoDlTensorMut; updates test to use DeviceTensor.
IVF-Flat, IVF-PQ, and Vamana Index API migration
rust/cuvs/src/ivf_flat/, rust/cuvs/src/ivf_pq/, rust/cuvs/src/vamana/
Migrates build and search signatures from ManagedTensor/Into<ManagedTensor> to impl IntoDlTensor/impl IntoDlTensorMut across all three index types; updates all tests to use DeviceTensor; condenses module-level docs.
K-means and pairwise distance API migration
rust/cuvs/src/cluster/kmeans/mod.rs, rust/cuvs/src/distance/mod.rs
Migrates fit, predict, cluster_cost, and pairwise_distance to generic IntoDlTensor/IntoDlTensorMut parameters; updates optional sample_weight to Option<impl IntoDlTensor>; updates tests to use DeviceTensor.
CAGRA example rewrite with user-defined CudaTensor
rust/cuvs/examples/cagra.rs, rust/cuvs/src/resources.rs
Rewrites the CAGRA example to define CudaTensor<T> using raw CUDA FFI, implement IntoDlTensor/IntoDlTensorMut on it, and demonstrate a full build/search workflow using only user-owned tensors. Adds # Safety docs to Resources::set_cuda_stream.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Suggested labels

non-breaking, Build

Suggested reviewers

  • AyodeAwe
  • robertmaynard
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Track tensor and index lifetimes in the Rust DLPack bindings' accurately and specifically describes the main change: introducing lifetime parameters to tensor views and indexes to prevent data use-after-free.
Description check ✅ Passed The description accurately explains the core problem (ManagedTensor lacking lifetime constraints) and the solution (lifetime-parameterized DLTensorView/DLTensorViewMut with IntoDlTensor traits).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
rust/cuvs/src/cagra/index.rs (1)

22-23: ⚡ Quick win

Fix unresolved rustdoc link to Index::merge.

Line 22 links to [Index::merge], but this type currently does not expose a merge method in this module. That can produce a broken intra-doc link warning in docs builds.

Suggested doc tweak
-/// [`Index::merge`], the data is self-contained and the lifetime is
+/// a merged index, the data is self-contained and the lifetime is
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/cuvs/src/cagra/index.rs` around lines 22 - 23, The rustdoc link in the
documentation comment references `Index::merge`, but the Index type does not
expose this method, creating a broken intra-doc link warning. Remove the
intra-doc link syntax around `Index::merge` in the doc comment (change
`[`Index::merge`]` to just `Index::merge` or remove the reference entirely if
it's not needed for clarity), or replace it with a reference to an actual method
that exists on the Index type.
rust/cuvs/examples/cagra.rs (1)

117-137: Add size validation guard to to_host before copying device data to host array.

The to_host method copies self.bytes to the destination host array without verifying that the caller-provided buffer has matching capacity. This safe wrapper could silently perform an out-of-bounds write if the destination array has a different element count than the source.

Add a check to ensure the destination buffer byte size matches:

Suggested guard
 fn to_host<D>(&self, res: &Resources, host: &mut ndarray::ArrayRef<T, D>) -> ExampleResult<()>
 where
     D: ndarray::Dimension,
 {
     if !host.is_standard_layout() {
         return Err("host array must be contiguous (row-major)".into());
     }
+    let host_bytes = host
+        .len()
+        .checked_mul(std::mem::size_of::<T>())
+        .ok_or("host array size overflow")?;
+    if host_bytes != self.bytes {
+        return Err(format!(
+            "host buffer size mismatch: expected {} bytes, got {} bytes",
+            self.bytes, host_bytes
+        )
+        .into());
+    }

     let stream = res.get_cuda_stream()?;
     check_cuda(unsafe {
         cudaMemcpyAsync(
             host.as_mut_ptr() as *mut c_void,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/cuvs/examples/cagra.rs` around lines 117 - 137, The to_host method
performs an unsafe memory copy via cudaMemcpyAsync without validating that the
destination host array has sufficient capacity to receive self.bytes. Add a size
validation check after verifying the host array is contiguous but before calling
cudaMemcpyAsync to ensure the host array's byte capacity matches self.bytes,
returning an appropriate error if the sizes do not match. This prevents
potential out-of-bounds writes to the destination buffer.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rust/cuvs/Cargo.toml`:
- Line 18: The tinyvec dependency in rust/cuvs/Cargo.toml includes the
latest_stable_rust feature without an explicit rust-version declaration in the
workspace package metadata, which can cause silent breakage when new Rust stable
versions are released. Fix this by either adding a rust-version field to the
[workspace.package] section to define an explicit minimum supported Rust version
that aligns with your project requirements, or remove the latest_stable_rust
feature from the tinyvec dependency configuration and replace it with a stable
feature set that is compatible with your MSRV.

In `@rust/cuvs/src/test_utils.rs`:
- Around line 21-27: The DeviceTensor struct stores a raw ffi::cuvsResources_t
handle without tying its lifetime to the underlying Resources object, allowing
DeviceTensor to outlive Resources and cause a use-after-free when Drop calls
cuvsRMMFree with an invalid handle. Add a lifetime parameter to the DeviceTensor
struct definition (around line 21) and bind the resources field to that lifetime
by storing a reference instead of a raw pointer. Update the Drop implementation
(around lines 30-43) to work with the borrowed reference instead of the raw
handle. Ensure all construction sites of DeviceTensor (around lines 116-120)
properly pass a reference with the correct lifetime to guarantee the resource
stays valid for the entire lifetime of the DeviceTensor instance.

---

Nitpick comments:
In `@rust/cuvs/examples/cagra.rs`:
- Around line 117-137: The to_host method performs an unsafe memory copy via
cudaMemcpyAsync without validating that the destination host array has
sufficient capacity to receive self.bytes. Add a size validation check after
verifying the host array is contiguous but before calling cudaMemcpyAsync to
ensure the host array's byte capacity matches self.bytes, returning an
appropriate error if the sizes do not match. This prevents potential
out-of-bounds writes to the destination buffer.

In `@rust/cuvs/src/cagra/index.rs`:
- Around line 22-23: The rustdoc link in the documentation comment references
`Index::merge`, but the Index type does not expose this method, creating a
broken intra-doc link warning. Remove the intra-doc link syntax around
`Index::merge` in the doc comment (change `[`Index::merge`]` to just
`Index::merge` or remove the reference entirely if it's not needed for clarity),
or replace it with a reference to an actual method that exists on the Index
type.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1e292a2b-8b54-4ea0-84e8-da899f2498fb

📥 Commits

Reviewing files that changed from the base of the PR and between 6672103 and 1bc3436.

📒 Files selected for processing (18)
  • rust/cuvs/Cargo.toml
  • rust/cuvs/examples/cagra.rs
  • rust/cuvs/src/brute_force.rs
  • rust/cuvs/src/cagra/index.rs
  • rust/cuvs/src/cagra/mod.rs
  • rust/cuvs/src/cluster/kmeans/mod.rs
  • rust/cuvs/src/distance/mod.rs
  • rust/cuvs/src/dlpack.rs
  • rust/cuvs/src/error.rs
  • rust/cuvs/src/ivf_flat/index.rs
  • rust/cuvs/src/ivf_flat/mod.rs
  • rust/cuvs/src/ivf_pq/index.rs
  • rust/cuvs/src/ivf_pq/mod.rs
  • rust/cuvs/src/lib.rs
  • rust/cuvs/src/resources.rs
  • rust/cuvs/src/test_utils.rs
  • rust/cuvs/src/vamana/index.rs
  • rust/cuvs/src/vamana/mod.rs

Comment thread rust/cuvs/Cargo.toml Outdated
Comment thread rust/cuvs/src/test_utils.rs Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
rust/cuvs/src/brute_force.rs (1)

147-148: ⚠️ Potential issue | 🟠 Major

Change neighbors type from i64 to u32 to match FFI contract.

The FFI documentation for cuvsBruteForceSearch explicitly requires neighbors to have type kDLUInt with 32 bits (i.e., u32). Using i64 violates this contract and risks silent data corruption if the C implementation writes 32-bit values to a 64-bit buffer. CAGRA tests correctly use u32.

Update both allocations:

Suggested changes
-        let mut neighbors_host = ndarray::Array::<i64, _>::zeros((n_queries, k));
-        let mut neighbors = DeviceTensor::<i64>::zeros(&res, &[n_queries, k]).unwrap();
+        let mut neighbors_host = ndarray::Array::<u32, _>::zeros((n_queries, k));
+        let mut neighbors = DeviceTensor::<u32>::zeros(&res, &[n_queries, k]).unwrap();

Also update any assertions that compare or cast neighbors values accordingly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/cuvs/src/brute_force.rs` around lines 147 - 148, The `neighbors` and
`neighbors_host` tensor declarations are using `i64` type, but the FFI contract
for `cuvsBruteForceSearch` requires `u32`. Update the type parameter from `i64`
to `u32` in both the `neighbors_host` ndarray allocation and the `neighbors`
DeviceTensor allocation. Additionally, search the file for any assertions,
comparisons, or casts involving `neighbors` values and update them to work
correctly with the new `u32` type instead of `i64`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@rust/cuvs/src/brute_force.rs`:
- Around line 147-148: The `neighbors` and `neighbors_host` tensor declarations
are using `i64` type, but the FFI contract for `cuvsBruteForceSearch` requires
`u32`. Update the type parameter from `i64` to `u32` in both the
`neighbors_host` ndarray allocation and the `neighbors` DeviceTensor allocation.
Additionally, search the file for any assertions, comparisons, or casts
involving `neighbors` values and update them to work correctly with the new
`u32` type instead of `i64`.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e9417b5c-73db-45d8-99a4-ac58de5132e3

📥 Commits

Reviewing files that changed from the base of the PR and between 1bc3436 and 07d3e5c.

📒 Files selected for processing (6)
  • rust/cuvs/Cargo.toml
  • rust/cuvs/src/brute_force.rs
  • rust/cuvs/src/cagra/index.rs
  • rust/cuvs/src/cluster/kmeans/mod.rs
  • rust/cuvs/src/distance/mod.rs
  • rust/cuvs/src/test_utils.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • rust/cuvs/src/distance/mod.rs
  • rust/cuvs/src/test_utils.rs
  • rust/cuvs/src/cluster/kmeans/mod.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Introduces a breaking change improvement Improves an existing functionality Rust

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant