[VectorId] Remove Id conversion bounds and traits#1145
Conversation
# Conflicts: # diskann-bftree/src/neighbors.rs
There was a problem hiding this comment.
Pull request overview
This PR loosens VectorId constraints by removing scalar conversion bounds/traits and updating affected crates (core index, providers, bf-tree provider, label-filter, and benchmark) to work without implicit usize/primitive conversions.
Changes:
- Remove
VectorIdconversion-related trait bounds and delete the associated conversion helpers/tests fromdiskann/src/utils/utils.rs. - Update graph pruning and multiple providers to avoid
TryIntoVectorId/VectorIdTryFromand operate on explicit iterators / concreteu32IDs where applicable. - Add/propagate
IntoUsizewhere IDs must be usable as bitmap / RoaringTreemap keys; refactor bf-tree neighbor list trailing-length encoding into dedicated helpers.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann/src/utils/vector_id.rs | Removes scalar conversion bounds from the VectorId trait. |
| diskann/src/utils/utils.rs | Deletes vector-id conversion helpers/traits and their tests; keeps IntoUsize/TypeStr. |
| diskann/src/graph/index.rs | Changes DiskANNIndex::prune_range to accept an iterator of IDs rather than an ID range requiring conversions. |
| diskann-providers/src/storage/index_storage.rs | Updates tests to use non-generic SimpleNeighborProviderAsync. |
| diskann-providers/src/model/graph/provider/async_/simple_neighbor_provider.rs | Drops generic ID parameter and stores neighbor list length as trailing u32. |
| diskann-providers/src/model/graph/provider/async_/memory_vector_provider.rs | Removes use of deleted vecid_from_usize helper in tests. |
| diskann-providers/src/model/graph/provider/async_/inmem/spherical.rs | Updates neighbor provider type usage after SimpleNeighborProviderAsync de-genericization. |
| diskann-providers/src/model/graph/provider/async_/inmem/scalar.rs | Updates neighbor provider type usage after SimpleNeighborProviderAsync de-genericization. |
| diskann-providers/src/model/graph/provider/async_/inmem/provider.rs | Propagates non-generic SimpleNeighborProviderAsync through provider/accessor types. |
| diskann-providers/src/model/graph/provider/async_/inmem/product.rs | Updates neighbor provider type usage after SimpleNeighborProviderAsync de-genericization. |
| diskann-providers/src/model/graph/provider/async_/inmem/full_precision.rs | Updates neighbor provider type usage after SimpleNeighborProviderAsync de-genericization. |
| diskann-providers/src/model/graph/provider/async_/fast_memory_vector_provider.rs | Removes use of deleted vecid_from_usize helper in tests. |
| diskann-label-filter/src/inline_beta_search/inline_beta_filter.rs | Adds IntoUsize bound where internal IDs are used for bitmap/roaring membership checks. |
| diskann-label-filter/src/encoded_attribute_provider/roaring_attribute_store.rs | Adds IntoUsize bound and switches ID→u64 mapping to go through usize. |
| diskann-disk/src/search/provider/disk_vertex_provider_factory.rs | Removes unnecessary TryIntoVectorId usage when IDs are already u32. |
| diskann-disk/src/build/builder/build.rs | Replaces removed conversion helpers with direct u32 casts at trait-object boundaries and during pruning. |
| diskann-bftree/src/vectors.rs | Removes generic ID parameter; makes starting_points() return Vec<u32>. |
| diskann-bftree/src/provider.rs | Adjusts for infallible starting_points() from VectorProvider. |
| diskann-bftree/src/neighbors.rs | Adds AsKey bound and centralizes neighbor-list trailing length encoding/decoding. |
| diskann-bftree/src/lib.rs | Makes AsKey public and adds an impl for u32. |
| diskann-benchmark/src/utils/filters.rs | Adds IntoUsize bound to query label providers that index into bitsets. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // The assertion above guarantees `neighbors.len() < self.graph.dim()`, which | ||
| // means it fits in a `u32` (graph dim is sized in `u32` anyway). | ||
| list[self.graph.dim() - 1] = neighbors.len() as u32; |
There was a problem hiding this comment.
Skipping the check since neighbors.len() won't exceed u32::MAX in practice.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1145 +/- ##
==========================================
- Coverage 89.43% 89.40% -0.04%
==========================================
Files 484 484
Lines 91495 91229 -266
==========================================
- Hits 81829 81563 -266
Misses 9666 9666
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
hildebrandmw
left a comment
There was a problem hiding this comment.
I support this change! Two general categories of feedback. First, can we just store the u32 length directly in diskann-bftree and remove the conversion to I entirely? We're already going to bytes.
Second: this change adds unchecked as u32 conversions where-as they were checked before. I'd prefer to keep these as checked conversions to be defensive.
| + TryIntoInteger<u32> | ||
| + Into<u64> | ||
| + IntoUsize | ||
| + FromPrimitive |
| match vector_data { | ||
| Some((i, (vector, _))) => { | ||
| let id = vecid_from_usize(i)?; | ||
| let id = i as u32; |
There was a problem hiding this comment.
This does loosen the safety here. Before the conversion was checked -now it's not. Is that a concern?
There was a problem hiding this comment.
Made the conversion fallible, thanks for flagging.
This PR removes the conversion traits/bounds of
VectorIdto and from scalar like types. This is part of an effort to loosen the constraints we impose on Ids.For reviewers: Review in the following order of changes mentioned.
VectorIdboundsRemoved all the
VectorIdTryFrom,Into*andFromPrimitivetrait bounds onVectorId. This removes the constraint that internal ids need to be able to be converted to and fromusize.Along with that, this removes the following traits entirely from
diskann/src/utils/utils.rs:VectorIdTryFrom<T>,TryIntoVectorId<T>vector_id_try_from,try_into_vector_idvecid_from_u32,vecid_from_usizeIdConversionError<const, F, T>+ aliasErrorToVectorId<F, T>diskann-label-filterneedsIntoUsizebound on the ids since they are used as keys in the roaring treemap. This bound on the Id is added to:RoaringAttributeStore<IT>(roaring treemap keys must beu64)InlineBetaStrategy(propagates the above throughDocumentProvider)QueryBitmapEvaluator/BitmapFilter(bitmap membership is keyed byusize)Algorithm changes
DiskANNIndex::prune_range: now takesimpl IntoIterator<Item = DP::InternalId> + Sendinstead ofRange<DP::InternalId>. Caller explicitly constructs the iterator for its specific provider's Id type.InmemIndexBuilder::final_prunefordisk-index: signature unchanged externally (Range<u32>); now constructs theu32range withas u32at the trait-object boundary.Provider specializations
SimpleNeighborProviderAsync<I>was written as generic over anI : VectorId. The generic is now removed since it is only ever instantiated withI = u32. Propagated this change to all in-mem providers.bftree::VectorProvider<T, I: VectorId = u32>.starting_points()becomes infallible (Vec<u32>instead ofResult<Vec<I>, ErrorToVectorId<…>>).bftree::NeighborProviderRemoved the
AsKeytrait since it was a wrapper used to convert fromusizeandu32to raw bytes.