Skip to content

Preserve lazy shard loading by decoupling centroid routing from shard body loads #27

@rmax

Description

@rmax

Index routing currently calls load_shard() for every shard just to read centroids, so the first query deserializes the full index into RAM before nprobe selection. This defeats the documented lazy-loading design, inflates cold-start latency, and makes memory usage scale with full index size instead of the probed working set.

Desired direction:

  • separate centroid routing metadata from full shard bodies
  • choose probe shards without loading every shard payload
  • keep full shard bodies lazily loaded only for the selected probe set
  • version any manifest or binary format changes explicitly
  • update docs when formats change

Acceptance criteria:

  • IndexSearcher can route without load_shard() on every shard
  • first query only deserializes probed shard bodies
  • tests verify non-probed shards are not loaded during routing
  • docs updated for any schema/artifact changes
  • benchmarking or instrumentation can confirm cold-start improvement

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions