Standardize names and use kebab case in diskann-tools#1217
Open
magdalendobson wants to merge 26 commits into
Open
Standardize names and use kebab case in diskann-tools#1217magdalendobson wants to merge 26 commits into
magdalendobson wants to merge 26 commits into
Conversation
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR standardizes diskann-tools CLI flag naming (moving to kebab-case) and updates internal naming around query-time filtering (“vector filters” → “filter bitmaps”). It also adds a new range-groundtruth computation command and supporting utilities.
Changes:
- Renamed filter-related parameters to
filter_bitmap_fileand improved mismatch handling by returning tool errors instead of panicking. - Standardized many
claplong option names to kebab-case across multiple binaries. - Added range-search ground truth computation plumbing (
compute_range_ground_truth_from_datafiles) and a newcompute_range_groundtruthbinary.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-tools/src/utils/search_disk_index.rs | Renames filter input param and replaces an assert_eq! with a proper CLI error when filter/query counts mismatch. |
| diskann-tools/src/utils/ground_truth.rs | Adds reusable query-bitmap builder, renames filter input param, and introduces range-groundtruth computation + writer usage. |
| diskann-tools/src/bin/subsample_bin.rs | Converts args to explicit kebab-case long flags (notably changes previously positional args into required options). |
| diskann-tools/src/bin/relative_contrast.rs | Renames CLI flags to kebab-case. |
| diskann-tools/src/bin/range_search_disk_index.rs.disabled | Renames CLI flags to kebab-case in the disabled range-search binary. |
| diskann-tools/src/bin/random_data_generator.rs | Renames CLI flags to kebab-case. |
| diskann-tools/src/bin/generate_synthetic_labels.rs | Renames CLI flags to kebab-case. |
| diskann-tools/src/bin/generate_pq.rs | Renames CLI flags to kebab-case. |
| diskann-tools/src/bin/generate_minmax.rs | Renames CLI flags to kebab-case. |
| diskann-tools/src/bin/gen_associated_data_from_range.rs | Renames CLI flag to kebab-case. |
| diskann-tools/src/bin/compute_specificities.rs | Renames CLI flags to kebab-case. |
| diskann-tools/src/bin/compute_range_groundtruth.rs | Adds a new CLI entrypoint for generating range-search ground truth (kebab-case flags). |
| diskann-tools/src/bin/compute_multivec_groundtruth.rs | Renames CLI flags to kebab-case and updates filter/label flag naming. |
| diskann-tools/src/bin/compute_groundtruth.rs | Renames filter arg to filter-bitmap-file and standardizes multiple CLI flags to kebab-case. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
107
to
109
| /// Optional labels file for base vectors | ||
| #[arg(long = "base_file_labels", default_value = None)] | ||
| #[arg(long = "base-file-labels", default_value = None)] | ||
| pub base_file_labels: Option<String>, |
Comment on lines
+122
to
130
| #[arg(long = "base-file-labels", default_value = None)] | ||
| pub base_file_labels: Option<String>, | ||
|
|
||
| /// File containing the query vectors in binary format | ||
| #[arg(long = "query_file", short, required = true)] | ||
| #[arg(long = "query-file", short, required = true)] | ||
| pub query_file: String, | ||
|
|
||
| #[arg(long = "query_file_labels", default_value = None)] | ||
| #[arg(long = "query-file-labels", default_value = None)] | ||
| pub query_file_labels: Option<String>, |
Comment on lines
136
to
146
| /// Filter bitmap file in the range ground truth format | ||
| #[arg(long = "filter_bitmap_file", short, default_value = None)] | ||
| #[arg(long = "filter-bitmap-file", short, default_value = None)] | ||
| pub filter_bitmap_file: Option<String>, | ||
|
|
||
| /// Number of ground truth nearest neigbhors to compute | ||
| #[arg(long = "recall_at", short = 'K', default_value = "10")] | ||
| #[arg(long = "recall-at", short = 'K', default_value = "10")] | ||
| pub recall_at: u32, | ||
|
|
||
| /// File containing the associated data in binary format | ||
| #[arg(long = "associated_data_file", required = false, default_value = None)] | ||
| #[arg(long = "associated-data-file", required = false, default_value = None)] | ||
| pub associated_data_file: Option<String>, |
Comment on lines
+112
to
120
| #[arg(long = "base-file-labels", default_value = None)] | ||
| pub base_file_labels: Option<String>, | ||
|
|
||
| /// File containing the query vectors in binary format | ||
| #[arg(long = "query_file", short, required = true)] | ||
| #[arg(long = "query-file", short, required = true)] | ||
| pub query_file: String, | ||
|
|
||
| #[arg(long = "query_file_labels", default_value = None)] | ||
| #[arg(long = "query-file-labels", default_value = None)] | ||
| pub query_file_labels: Option<String>, |
Comment on lines
+27
to
+31
| #[arg(long = "data-type", value_enum)] | ||
| data_type: DataType, | ||
|
|
||
| /// Input base binary file | ||
| #[arg(long = "base-bin-file")] |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1217 +/- ##
==========================================
+ Coverage 89.64% 90.65% +1.01%
==========================================
Files 502 502
Lines 95500 95500
==========================================
+ Hits 85607 86576 +969
+ Misses 9893 8924 -969
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR standardizes all variable names in diskann-tools. It also enforces that every commandline parameter uses kebab case.