Preserve lazy shard loading by decoupling centroid routing from shard body loads

Index routing currently calls load_shard() for every shard just to read centroids, so the first query deserializes the full index into RAM before nprobe selection. This defeats the documented lazy-loading design, inflates cold-start latency, and makes memory usage scale with full index size instead of the probed working set.

Desired direction:
- separate centroid routing metadata from full shard bodies
- choose probe shards without loading every shard payload
- keep full shard bodies lazily loaded only for the selected probe set
- version any manifest or binary format changes explicitly
- update docs when formats change

Acceptance criteria:
- IndexSearcher can route without load_shard() on every shard
- first query only deserializes probed shard bodies
- tests verify non-probed shards are not loaded during routing
- docs updated for any schema/artifact changes
- benchmarking or instrumentation can confirm cold-start improvement

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve lazy shard loading by decoupling centroid routing from shard body loads #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Preserve lazy shard loading by decoupling centroid routing from shard body loads #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions