-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Parent
- Parent issue: Milestone 1 - Shardlake Post-Prototype Roadmap #5
Goal
Introduce real approximate nearest-neighbor methods beyond brute-force shard search.
Detailed tasks
- 3.1 Implement IVF index
- Train a coarse quantizer.
- Assign vectors to clusters.
- Store cluster posting lists.
- 3.2 Implement PQ compression
- Add a product quantization encoder.
- Store PQ codes instead of raw vectors.
- 3.3 Candidate retrieval pipeline
- Implement the pipeline:
- query vector
- centroid routing
- IVF candidate selection
- PQ approximate distance
- top-K selection
- optional exact rerank
- Implement the pipeline:
- 3.4 Exact reranking
- Fetch raw vectors for top candidates.
- Compute exact distance.
- Return the final ranking.
- 3.5 Recall evaluation
- Add the CLI command
shardlake eval-ann. - Report:
- recall@k
- precision@k
- latency
- Add the CLI command
Definition of done
- IVF and PQ-based search paths exist alongside exact search.
- Approximate candidate generation can optionally rerank against exact vectors.
- ANN quality is measurable through a dedicated evaluation command.
Reactions are currently unavailable