Skip to content

Preserve lazy shard loading by decoupling centroid routing from shard body loads #28

@rmax

Description

@rmax

Summary

IndexSearcher::search() currently gathers centroids by calling load_shard() for every shard. Because load_shard() deserializes the full .sidx payload, the first query effectively loads the entire index into RAM before nprobe selection.

Why this is a problem

  • it breaks the documented lazy shard loading behavior
  • cold-start query latency includes full-index deserialization work
  • peak memory scales with total index size rather than the probed working set
  • nprobe still limits scoring work, but not the up-front shard loading cost

Desired outcome

Persist centroid routing metadata separately from full shard bodies so shard selection can happen without deserializing every shard. Full shard payloads should remain lazily loaded only for the selected probe set and then cached normally.

Acceptable implementation directions

  1. store centroids in the manifest
  2. write a dedicated centroid sidecar artifact
  3. extend the shard format and reader so headers or centroid sections can be read without loading records

Acceptance criteria

  • IndexSearcher can choose probe shards without calling load_shard() on every shard
  • the first query only loads shard bodies for the selected probe set
  • tests verify that non-probed shards are not deserialized during routing
  • docs are updated if manifest or artifact formats change
  • any schema or binary format change is versioned explicitly

Nice-to-have

  • add instrumentation for shard loads and cache hits
  • compare cold-start latency before and after the change

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions