Preserve lazy shard loading by decoupling centroid routing from shard body loads

## Summary

`IndexSearcher::search()` currently gathers centroids by calling `load_shard()` for every shard. Because `load_shard()` deserializes the full `.sidx` payload, the first query effectively loads the entire index into RAM before `nprobe` selection.

## Why this is a problem

- it breaks the documented lazy shard loading behavior
- cold-start query latency includes full-index deserialization work
- peak memory scales with total index size rather than the probed working set
- `nprobe` still limits scoring work, but not the up-front shard loading cost

## Desired outcome

Persist centroid routing metadata separately from full shard bodies so shard selection can happen without deserializing every shard. Full shard payloads should remain lazily loaded only for the selected probe set and then cached normally.

## Acceptable implementation directions

1. store centroids in the manifest
2. write a dedicated centroid sidecar artifact
3. extend the shard format and reader so headers or centroid sections can be read without loading records

## Acceptance criteria

- `IndexSearcher` can choose probe shards without calling `load_shard()` on every shard
- the first query only loads shard bodies for the selected probe set
- tests verify that non-probed shards are not deserialized during routing
- docs are updated if manifest or artifact formats change
- any schema or binary format change is versioned explicitly

## Nice-to-have

- add instrumentation for shard loads and cache hits
- compare cold-start latency before and after the change


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve lazy shard loading by decoupling centroid routing from shard body loads #28

Summary

Why this is a problem

Desired outcome

Acceptable implementation directions

Acceptance criteria

Nice-to-have

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Preserve lazy shard loading by decoupling centroid routing from shard body loads #28

Description

Summary

Why this is a problem

Desired outcome

Acceptable implementation directions

Acceptance criteria

Nice-to-have

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions