-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
IndexSearcher::search() currently gathers centroids by calling load_shard() for every shard. Because load_shard() deserializes the full .sidx payload, the first query effectively loads the entire index into RAM before nprobe selection.
Why this is a problem
- it breaks the documented lazy shard loading behavior
- cold-start query latency includes full-index deserialization work
- peak memory scales with total index size rather than the probed working set
nprobestill limits scoring work, but not the up-front shard loading cost
Desired outcome
Persist centroid routing metadata separately from full shard bodies so shard selection can happen without deserializing every shard. Full shard payloads should remain lazily loaded only for the selected probe set and then cached normally.
Acceptable implementation directions
- store centroids in the manifest
- write a dedicated centroid sidecar artifact
- extend the shard format and reader so headers or centroid sections can be read without loading records
Acceptance criteria
IndexSearchercan choose probe shards without callingload_shard()on every shard- the first query only loads shard bodies for the selected probe set
- tests verify that non-probed shards are not deserialized during routing
- docs are updated if manifest or artifact formats change
- any schema or binary format change is versioned explicitly
Nice-to-have
- add instrumentation for shard loads and cache hits
- compare cold-start latency before and after the change
Reactions are currently unavailable