Skip to content

Epic 3 - ANN Index Algorithms #8

@rmax

Description

@rmax

Parent

Goal

Introduce real approximate nearest-neighbor methods beyond brute-force shard search.

Detailed tasks

  • 3.1 Implement IVF index
    • Train a coarse quantizer.
    • Assign vectors to clusters.
    • Store cluster posting lists.
  • 3.2 Implement PQ compression
    • Add a product quantization encoder.
    • Store PQ codes instead of raw vectors.
  • 3.3 Candidate retrieval pipeline
    • Implement the pipeline:
      • query vector
      • centroid routing
      • IVF candidate selection
      • PQ approximate distance
      • top-K selection
      • optional exact rerank
  • 3.4 Exact reranking
    • Fetch raw vectors for top candidates.
    • Compute exact distance.
    • Return the final ranking.
  • 3.5 Recall evaluation
    • Add the CLI command shardlake eval-ann.
    • Report:
      • recall@k
      • precision@k
      • latency

Definition of done

  • IVF and PQ-based search paths exist alongside exact search.
  • Approximate candidate generation can optionally rerank against exact vectors.
  • ANN quality is measurable through a dedicated evaluation command.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions