Inline filtered search with adaptive L#1131
Conversation
…en/two-queue-adaptive-l
…en/two-queue-adaptive-l
…en/two-queue-adaptive-l
…en/two-queue-adaptive-l
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks Magdalen! In addition to my inline comments - can you also add some integration tests exercising the functionality here? These go a surprisingly long way towards protecting the algorithm.
Also, can we bikeshed InlineSearch a little? Maybe FilteredSearch? Or InlineFilteredSearch? Not a huge deal, but InlineSearch seems a little opaque to me.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1131 +/- ##
==========================================
+ Coverage 89.45% 89.50% +0.04%
==========================================
Files 484 487 +3
Lines 91407 92398 +991
==========================================
+ Hits 81765 82697 +932
- Misses 9642 9701 +59
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
I've added some integration test. In addition to throwing some of the existing cases in multihop at it, I also designed a test that should return different results depending on whether or not the adaptive L feature is enabled. Regarding the naming, I named it |
| /// specificity = 0.1% (1/1000) → 8× L | ||
| /// and so on up to a pre-set maximum multipler | ||
| #[derive(Debug)] | ||
| pub struct InlineSearch<'q, InternalId> { |
There was a problem hiding this comment.
not a blocker for this PR, but how does this compose with range search and diverse search functionalities?
There was a problem hiding this comment.
From an algorithmic perspective, inline filtering (without any adaptation) will compose seamlessly with any other type of search, because it's just adding the extra step of checking match and keeping track of matched elements. However, from the perspective of our codebase as written right now, it would require new function signatures that accept a LabelProvider.
Thinking about how adaptive L in particular would compose with range search, we could use it within the initial graph search before deciding whether to move on to the unbounded part of range search. Similarly we could compose it with diverse search to increase L_search when few matching candidates are found.
There was a problem hiding this comment.
ok, could you please document these to-dos as issues
| I: VectorId, | ||
| A: ExpandBeam<T, Id = I> + SearchExt, | ||
| SR: SearchRecord<I> + ?Sized, | ||
| { |
There was a problem hiding this comment.
would we want to allow a paginated API For this function?
There was a problem hiding this comment.
Hmm, that's a good question. Using pagination and adaptive L seems like a strange thing to do. Adapting L_search is morally pretty similar to pagination, since it extends the search for longer when few matched results are found. If there was a user who wanted to use pure inline filtering and control length of the search via pagination only, on the other hand that seems reasonable.
There was a problem hiding this comment.
The user might be in a position where the first search did not yield enough results that pass filters? In such a case, is our advise to retry with larger L?
There was a problem hiding this comment.
This would be the advice if the user is sure that the number of results they are requesting actually exist.
There was a problem hiding this comment.
Pull request overview
This PR adds a new “inline filtered search” path (with optional adaptive L scaling) to the diskann graph search API and wires it through the benchmark harness and test suite.
Changes:
- Introduces
InlineSearch+AdaptiveLindiskann/src/graph/search/inline_filter_search.rsand re-exports them fromdiskann::graph::search. - Adds end-to-end and algorithm-focused tests for inline filtered traversal behavior.
- Extends
diskann-benchmark/diskann-benchmark-coreto support a newtopk-inline-filtersearch phase, including an example JSON and integration test.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann/src/graph/test/cases/multihop.rs | Exposes test helpers/filters (pub(super)) for reuse by the new inline test suite. |
| diskann/src/graph/test/cases/mod.rs | Registers the new inline test module. |
| diskann/src/graph/test/cases/inline.rs | Adds end-to-end tests for index.search(InlineSearch { .. }), including adaptive-L behavior. |
| diskann/src/graph/search/multihop_search.rs | Alters scratch initialization (currently introduces a safety/correctness regression if scratch is reused). |
| diskann/src/graph/search/mod.rs | Adds the inline search module and publicly re-exports InlineSearch/AdaptiveL. |
| diskann/src/graph/search/inline_filter_search.rs | Implements inline filtered search and adaptive-L sizing; includes unit tests. |
| diskann-benchmark/src/main.rs | Adds an integration test that runs the new inline-filter example config. |
| diskann-benchmark/src/inputs/graph_index.rs | Adds TopkInlineFilter search phase + adaptive-L config parsing/validation. |
| diskann-benchmark/src/backend/index/spherical.rs | Registers and implements the inline-filter search plugin for spherical backend. |
| diskann-benchmark/src/backend/index/search/plugins.rs | Adds the TopkInlineFilter plugin type/kind mapping. |
| diskann-benchmark/src/backend/index/search/knn.rs | Adds Knn trait implementation for benchmark_core::search::graph::InlineSearch. |
| diskann-benchmark/src/backend/index/benchmarks.rs | Registers and implements the inline-filter search plugin for the main backend. |
| diskann-benchmark/example/graph-index-inline-filter.json | Adds a runnable benchmark example for topk-inline-filter with adaptive_l. |
| diskann-benchmark-core/src/search/graph/mod.rs | Exposes the new benchmark-core inline graph search helper module/type. |
| diskann-benchmark-core/src/search/graph/inline.rs | Adds benchmark-core InlineSearch wrapper + tests for inline filtering behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks Magdalen, my main concern here is getting in decent test coverage using the baseline testing infrastructure. It's more robust than ad-hoc metrics since it can store and validate a lot more information.
Also to bike shed, I do think InlineSearch is a bit vague. Maybe InlineFilteredSearch? Or FilteredKnn?
|
|
||
| use super::multihop::{BlockAndAdjust, EvenFilter, build_1d_provider, setup_grid_index}; | ||
|
|
||
| // Topology (3 levels below the start): |
There was a problem hiding this comment.
These tests still use the older style "looks kind of okay" approach to tests rather than using the more rigorous baseline tests that our other algorithms have moved to. This makes it significantly harder to refactor with confidence and also kind of forces us into a regime where the search sizes are pretty small.
For example, none of these integration test trigger the low-match regime of the adaptive L algorithm, so we aren't really protecting the behavior there.
Adding baselines should be relatively straightforward and greatly improves the quality of algorithm tests.
There was a problem hiding this comment.
I added tests with baselines, including a test with specificity as low as .1%. Do they look good to you now?
There was a problem hiding this comment.
If you comment out the unit tests for the adaptive-L, you will find the low-specificity tests that were added still never trigger the code in compute_daptive_l outside of the initial preamble (e.g., if matched == 0 || visited == 0). From what I can tell, even though a filter with a low selection percent are used, the filter is selecting for low IDs. Since the query is at [10, 10, 10] and the grid is constructed so lower coordinates have lower IDs, this actually means the selection criteria is crossed before any matches are actually found and it's always using the max multiplier.
You can test this yourself by running
cargo llvm-cov nextest --html --package diskann --cargo-profile ci
and opening ./target/llvm-cov/html/index.html and navigating to inline_filter_search.rs.
To actually test the algorithm, the initial exploration needs to see enough matched nodes before the decision point that the piecewise heuristic is actually triggered.
|
|
||
| // Matched results tracked separately — scratch.best contains all nodes | ||
| // for greedy navigation, matched_results contains only filter-matching nodes. | ||
| let mut matched_results = Vec::new(); |
There was a problem hiding this comment.
Should this be a NeighborPriorityQueue, or at least some other data structure that puts an upper-bound on its size?
There was a problem hiding this comment.
I got a 7-10% gain in QPS from pushing results to a vector and sorting once at the end instead of using a NeighborPriorityQueue, which is why I chose not to use it. If you have alternative ideas I am happy to experiment with them!
| adapt_cmps: usize, | ||
| adapt_hops: usize, | ||
| adapt_ids: Vec<u32>, | ||
| } |
There was a problem hiding this comment.
In addition to my other comment about not triggering the piecewise functionality of adaptive-L, I think the tests here could use one more cleanup pass. Here is my suggestion:
-
If we want to leave ourselves open to expanding in the future, using one baseline per flavor (one for fixed-L, one for adaptive-L) instead of merging into a single struct makes this much easier. Additionally,
run_inline_on_gridcould then return anInlineBaselinedirectly instead of a mega-tuple.It's okay to go nuts with generating multiple baselines per test.
-
Always include IDs and distances in baselines. This is very cheap extra protection. Basically, the more you throw in there, the better.
-
Probably worth also using a baseline for the three-level tests as that's cheap and captures strictly more information.
This PR implements the recommendation in the filtered search RFC to implement inline filtering with the adaptive L method as an optional addition.