Skip to content

Epic 3 - ANN Index Algorithms #8

@rmax

Description

@rmax

Parent

Goal

Introduce real approximate nearest-neighbor methods beyond brute-force shard search.

Detailed tasks

  • 3.1 Implement IVF index
    • Train a coarse quantizer.
    • Assign vectors to clusters.
    • Store cluster posting lists.
  • 3.2 Implement PQ compression
    • Add a product quantization encoder.
    • Store PQ codes instead of raw vectors.
  • 3.3 Candidate retrieval pipeline
    • Implement the pipeline:
      • query vector
      • centroid routing
      • IVF candidate selection
      • PQ approximate distance
      • top-K selection
      • optional exact rerank
  • 3.4 Exact reranking
    • Fetch raw vectors for top candidates.
    • Compute exact distance.
    • Return the final ranking.
  • 3.5 Recall evaluation
    • Add the CLI command shardlake eval-ann.
    • Report:
      • recall@k
      • precision@k
      • latency

Definition of done

  • IVF and PQ-based search paths exist alongside exact search.
  • Approximate candidate generation can optionally rerank against exact vectors.
  • ANN quality is measurable through a dedicated evaluation command.

Child issue breakdown

Dependency summary

Dependency graph

#43   #44
  \   /
   #45
   /   \\
#46   #47

Metadata

Metadata

Assignees

Labels

epicTop-level epic issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions