[ENH] Add maxscore index to metadata segment#6880
[ENH] Add maxscore index to metadata segment#6880Sicheng-Pan wants to merge 2 commits intohammad/maxscore_schema_gatingfrom
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
Wire This PR adds end-to-end metadata segment support for the new The change also introduces This summary was automatically generated by @propel-code-bot |
This comment has been minimized.
This comment has been minimized.
b8b2470 to
294856f
Compare
ed6eeec to
3f7caf7
Compare

Description of changes
This is PR 7 in the MaxScore stack. It wires
MaxScoreWriter/MaxScoreReaderinto the metadata segment so that collections withalgorithm: "max_score"in their schema use the new index format on both the write and read paths. The query pipeline is not yet connected — that follows in PR 8.MaxScoreReader::count_postings()(maxscore.rs): New method that counts total posting entries for a dimension by summinglen()across posting blocks. O(n_blocks). Used by the IDF operator in PR 8 to compute document frequency for BM25 scoring.blockfile_metadata.rs):maxscore_index_writer: Option<MaxScoreWriter>field toMetadataSegmentWriterShard.schema: Option<&Schema>parameter to bothMetadataSegmentWriter::from_segment()andMetadataSegmentWriterShard::from_segment().SPARSE_POSTINGin file_path → fork MaxScore index (open reader + forked writer)SPARSE_MAXin file_path → fork existing WAND index (unchanged)schema.is_maxscore_enabled()to decide which writer to createsparse_index_writer/maxscore_index_writerisSomeat a time.set_metadata/delete_metadataSparseVectorarms — checksmaxscore_index_writerfirst, falls back tosparse_index_writer.blockfile_metadata.rs):sparse_index_flusher: SparseFlushertoOption<SparseFlusher>+Option<MaxScoreFlusher>.commit()handles both writer paths;flush()conditionally insertsSPARSE_POSTINGorSPARSE_MAX+SPARSE_OFFSET_VALUEinto the flushed file_path map.blockfile_metadata.rs):maxscore_index_reader: Option<MaxScoreReader>field toMetadataSegmentReaderShard.SPARSE_POSTINGblockfile loaded concurrently in the existingtokio::join!. If present,maxscore_index_readeris populated and the old WAND reader is skipped.log_fetch_orchestrator.rs,attached_function_orchestrator.rs) passcollection.schema.as_ref().create_new_shardextracts schema from&Collection.None(backward-compatible — default WAND path).Test plan
Nonefor schema, so the WAND writer is created as before).chroma-segmentandworkercrates.Migration plan
No migration needed. Existing collections with
SPARSE_MAX+SPARSE_OFFSET_VALUEin their file_path continue to use the WAND reader/writer. New collections only get the MaxScore index if the schema hasalgorithm: "max_score"(set by the frontend gating in PR 6). The segment reader auto-detects which format is present based on file_path keys.Observability plan
No new metrics or spans. The 3-way branch in
from_segment()is logged implicitly through existing tracing on blockfile open/create operations.Documentation Changes
None.