Fix handling of deletes and ID minting during concurrent insert/delete#1146
Open
metajack wants to merge 1 commit into
Open
Fix handling of deletes and ID minting during concurrent insert/delete#1146metajack wants to merge 1 commit into
metajack wants to merge 1 commit into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses concurrency races in the Garnet provider’s ID lifecycle by tightening synchronization around ID minting and by splitting “soft delete” (remove mappings/attributes) from “final release” (delete vector payload + mark ID free), then wiring inplace_delete() to call release().
Changes:
- Call
data_provider.release()at the end ofIndex::inplace_delete()to complete deletions and safely recycle IDs. - Refactor Garnet provider delete/release responsibilities (mappings removed in
delete(), payload +fsm.mark_free()inrelease()), and adjust search post-processing to skip candidates whose external-id mapping is missing. - Rework
FreeSpaceMapID minting state (next_id+max_block) under a singleRwLockand serialize refills with a dedicatedMutex; bumpdiskann-garnetversion to 2.0.3.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann/src/graph/index.rs | Ensures inplace_delete() calls release() after dropping the adjacency list. |
| diskann-garnet/src/provider.rs | Splits delete vs release behavior; updates release to perform final cleanup; search post-process skips missing mappings. |
| diskann-garnet/src/fsm.rs | Consolidates ID minting/block expansion state under one lock and adds a refill mutex to avoid concurrent refills. |
| diskann-garnet/diskann-garnet.nuspec | Version bump to 2.0.3. |
| diskann-garnet/Cargo.toml | Version bump to 2.0.3. |
| Cargo.lock | Locks updated for diskann-garnet 2.0.3. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+828
to
835
| // Mark the ID free in the FSM. | ||
| if let Err(e) = self.fsm.mark_free(context, id) { | ||
| return future::ready(Err(e.into())); | ||
| }; | ||
|
|
||
| if !ok { | ||
| return future::ready(Err(GarnetError::Delete.into())); | ||
| } |
Comment on lines
+279
to
283
| let id_minter = self.id_minter.read().unwrap(); | ||
| if id >= id_minter.next_id { | ||
| return Err(FsmError::IdOutOfRange(id)); | ||
| } | ||
|
|
| } | ||
|
|
||
| /// Mark an ID according to value (true = used, false = free). | ||
| /// Mark an ID according to value (true = used, false = free), but don't check that its in range. |
Comment on lines
1109
to
1112
| let id = match accessor.provider.to_external_id(accessor.context, n.id) { | ||
| Ok(id) => id, | ||
| Err(e) => return future::ready(Err(e)), | ||
| Err(_) => continue, // vector got deleted; skip | ||
| }; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Garnet cluster testing discovered a race condition with ID minting. The issue was:
Because there is a time period between minting the ID and marking it used, it is possible to give the ID out twice.
One possible fix is to reorder the operations so that we mark IDs used before bumping the next_id atomic, however, this requires a manual CAS loop. Instead, I combined max_block and next_id into the same RwLock so that these operations can be more simple and explicitly controlled.
In addition, there was another race during delete where an ID is marked free during delete, but that ID may be recycled before the function exits. To address this, we remove only the mapping and attributes in delete(), which prevents the vector from being returned from search, and then finish the deletion in the release() call, and finish by marking the node free after the writes complete. However, current diskann does not actually call release(), so a call was added to the end of inplace_delete().