Skip to content

[VectorId] Remove Id conversion bounds and traits#1145

Open
arkrishn94 wants to merge 7 commits into
mainfrom
u/adkrishnan/vector-id
Open

[VectorId] Remove Id conversion bounds and traits#1145
arkrishn94 wants to merge 7 commits into
mainfrom
u/adkrishnan/vector-id

Conversation

@arkrishn94

@arkrishn94 arkrishn94 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

This PR removes the conversion traits/bounds of VectorId to and from scalar like types. This is part of an effort to loosen the constraints we impose on Ids.

For reviewers: Review in the following order of changes mentioned.

VectorId bounds

Removed all the VectorIdTryFrom, Into* and FromPrimitive trait bounds on VectorId. This removes the constraint that internal ids need to be able to be converted to and from usize.

Along with that, this removes the following traits entirely from diskann/src/utils/utils.rs:

  • traits VectorIdTryFrom<T>, TryIntoVectorId<T>
  • methods vector_id_try_from, try_into_vector_id
  • helpers vecid_from_u32, vecid_from_usize
  • error type IdConversionError<const, F, T> + alias ErrorToVectorId<F, T>
  • their tests

diskann-label-filter needs IntoUsize bound on the ids since they are used as keys in the roaring treemap. This bound on the Id is added to:

  • RoaringAttributeStore<IT> (roaring treemap keys must be u64)
  • InlineBetaStrategy (propagates the above through DocumentProvider)
  • QueryBitmapEvaluator / BitmapFilter (bitmap membership is keyed by usize)

Algorithm changes

  • DiskANNIndex::prune_range: now takes impl IntoIterator<Item = DP::InternalId> + Send instead of Range<DP::InternalId>. Caller explicitly constructs the iterator for its specific provider's Id type.
  • InmemIndexBuilder::final_prune for disk-index: signature unchanged externally (Range<u32>); now constructs the u32 range with as u32 at the trait-object boundary.

Provider specializations

  • SimpleNeighborProviderAsync<I> was written as generic over an I : VectorId. The generic is now removed since it is only ever instantiated with I = u32. Propagated this change to all in-mem providers.
  • Similar change to bftree::VectorProvider<T, I: VectorId = u32>. starting_points() becomes infallible (Vec<u32> instead of Result<Vec<I>, ErrorToVectorId<…>>).

bftree::NeighborProvider

Removed the AsKey trait since it was a wrapper used to convert from usize and u32 to raw bytes.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR loosens VectorId constraints by removing scalar conversion bounds/traits and updating affected crates (core index, providers, bf-tree provider, label-filter, and benchmark) to work without implicit usize/primitive conversions.

Changes:

  • Remove VectorId conversion-related trait bounds and delete the associated conversion helpers/tests from diskann/src/utils/utils.rs.
  • Update graph pruning and multiple providers to avoid TryIntoVectorId/VectorIdTryFrom and operate on explicit iterators / concrete u32 IDs where applicable.
  • Add/propagate IntoUsize where IDs must be usable as bitmap / RoaringTreemap keys; refactor bf-tree neighbor list trailing-length encoding into dedicated helpers.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
diskann/src/utils/vector_id.rs Removes scalar conversion bounds from the VectorId trait.
diskann/src/utils/utils.rs Deletes vector-id conversion helpers/traits and their tests; keeps IntoUsize/TypeStr.
diskann/src/graph/index.rs Changes DiskANNIndex::prune_range to accept an iterator of IDs rather than an ID range requiring conversions.
diskann-providers/src/storage/index_storage.rs Updates tests to use non-generic SimpleNeighborProviderAsync.
diskann-providers/src/model/graph/provider/async_/simple_neighbor_provider.rs Drops generic ID parameter and stores neighbor list length as trailing u32.
diskann-providers/src/model/graph/provider/async_/memory_vector_provider.rs Removes use of deleted vecid_from_usize helper in tests.
diskann-providers/src/model/graph/provider/async_/inmem/spherical.rs Updates neighbor provider type usage after SimpleNeighborProviderAsync de-genericization.
diskann-providers/src/model/graph/provider/async_/inmem/scalar.rs Updates neighbor provider type usage after SimpleNeighborProviderAsync de-genericization.
diskann-providers/src/model/graph/provider/async_/inmem/provider.rs Propagates non-generic SimpleNeighborProviderAsync through provider/accessor types.
diskann-providers/src/model/graph/provider/async_/inmem/product.rs Updates neighbor provider type usage after SimpleNeighborProviderAsync de-genericization.
diskann-providers/src/model/graph/provider/async_/inmem/full_precision.rs Updates neighbor provider type usage after SimpleNeighborProviderAsync de-genericization.
diskann-providers/src/model/graph/provider/async_/fast_memory_vector_provider.rs Removes use of deleted vecid_from_usize helper in tests.
diskann-label-filter/src/inline_beta_search/inline_beta_filter.rs Adds IntoUsize bound where internal IDs are used for bitmap/roaring membership checks.
diskann-label-filter/src/encoded_attribute_provider/roaring_attribute_store.rs Adds IntoUsize bound and switches ID→u64 mapping to go through usize.
diskann-disk/src/search/provider/disk_vertex_provider_factory.rs Removes unnecessary TryIntoVectorId usage when IDs are already u32.
diskann-disk/src/build/builder/build.rs Replaces removed conversion helpers with direct u32 casts at trait-object boundaries and during pruning.
diskann-bftree/src/vectors.rs Removes generic ID parameter; makes starting_points() return Vec<u32>.
diskann-bftree/src/provider.rs Adjusts for infallible starting_points() from VectorProvider.
diskann-bftree/src/neighbors.rs Adds AsKey bound and centralizes neighbor-list trailing length encoding/decoding.
diskann-bftree/src/lib.rs Makes AsKey public and adds an impl for u32.
diskann-benchmark/src/utils/filters.rs Adds IntoUsize bound to query label providers that index into bitsets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-disk/src/build/builder/build.rs Outdated
Comment thread diskann-disk/src/build/builder/build.rs Outdated
Comment thread diskann-disk/src/build/builder/build.rs Outdated
Comment on lines +81 to +83
// The assertion above guarantees `neighbors.len() < self.graph.dim()`, which
// means it fits in a `u32` (graph dim is sized in `u32` anyway).
list[self.graph.dim() - 1] = neighbors.len() as u32;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping the check since neighbors.len() won't exceed u32::MAX in practice.

Comment thread diskann-bftree/src/vectors.rs Outdated
Comment thread diskann-bftree/src/neighbors.rs Outdated
Comment thread diskann-bftree/src/neighbors.rs Outdated
Comment thread diskann-bftree/src/neighbors.rs Outdated
@codecov-commenter

codecov-commenter commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.23810% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.40%. Comparing base (a5c745b) to head (5dc9613).

Files with missing lines Patch % Lines
diskann-bftree/src/vectors.rs 85.71% 2 Missing ⚠️
diskann-bftree/src/quant.rs 66.66% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1145      +/-   ##
==========================================
- Coverage   89.43%   89.40%   -0.04%     
==========================================
  Files         484      484              
  Lines       91495    91229     -266     
==========================================
- Hits        81829    81563     -266     
  Misses       9666     9666              
Flag Coverage Δ
miri 89.40% <95.23%> (-0.04%) ⬇️
unittests 89.05% <95.23%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark/src/utils/filters.rs 86.25% <ø> (ø)
diskann-bftree/src/lib.rs 47.36% <ø> (-3.86%) ⬇️
diskann-bftree/src/neighbors.rs 93.53% <100.00%> (-0.09%) ⬇️
diskann-bftree/src/provider.rs 91.32% <100.00%> (ø)
diskann-disk/src/build/builder/build.rs 94.17% <100.00%> (+0.01%) ⬆️
...rc/search/provider/disk_vertex_provider_factory.rs 95.70% <100.00%> (ø)
...oded_attribute_provider/roaring_attribute_store.rs 75.60% <100.00%> (ø)
...ilter/src/inline_beta_search/inline_beta_filter.rs 0.00% <ø> (ø)
...aph/provider/async_/fast_memory_vector_provider.rs 94.69% <100.00%> (-0.07%) ⬇️
...odel/graph/provider/async_/inmem/full_precision.rs 98.53% <ø> (ø)
... and 11 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hildebrandmw hildebrandmw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support this change! Two general categories of feedback. First, can we just store the u32 length directly in diskann-bftree and remove the conversion to I entirely? We're already going to bytes.

Second: this change adds unchecked as u32 conversions where-as they were checked before. I'd prefer to keep these as checked conversions to be defensive.

+ TryIntoInteger<u32>
+ Into<u64>
+ IntoUsize
+ FromPrimitive

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incredible!

Comment thread diskann-bftree/src/lib.rs Outdated
Comment thread diskann-bftree/src/neighbors.rs Outdated
Comment thread diskann-bftree/src/neighbors.rs Outdated
Comment thread diskann-bftree/src/neighbors.rs Outdated
Comment thread diskann-disk/src/build/builder/build.rs Outdated
match vector_data {
Some((i, (vector, _))) => {
let id = vecid_from_usize(i)?;
let id = i as u32;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does loosen the safety here. Before the conversion was checked -now it's not. Is that a concern?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the conversion fallible, thanks for flagging.

Comment thread diskann/src/graph/index.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants