[ENH] Add maxscore index to metadata segment by Sicheng-Pan · Pull Request #6880 · chroma-core/chroma

Sicheng-Pan · 2026-04-10T20:36:52Z

Description of changes

This is PR 7 in the MaxScore stack. It wires MaxScoreWriter/MaxScoreReader into the metadata segment so that collections with algorithm: "max_score" in their schema use the new index format on both the write and read paths. The query pipeline is not yet connected — that follows in PR 8.

MaxScoreReader::count_postings() (maxscore.rs): New method that counts total posting entries for a dimension by summing len() across posting blocks. O(n_blocks). Used by the IDF operator in PR 8 to compute document frequency for BM25 scoring.
Metadata segment writer (blockfile_metadata.rs):
- Added maxscore_index_writer: Option<MaxScoreWriter> field to MetadataSegmentWriterShard.
- Added schema: Option<&Schema> parameter to both MetadataSegmentWriter::from_segment() and MetadataSegmentWriterShard::from_segment().
- 3-way branch in writer construction:
  1. SPARSE_POSTING in file_path → fork MaxScore index (open reader + forked writer)
  2. SPARSE_MAX in file_path → fork existing WAND index (unchanged)
  3. Neither (fresh collection) → check schema.is_maxscore_enabled() to decide which writer to create
- Only one of sparse_index_writer / maxscore_index_writer is Some at a time.
- Dual dispatch in set_metadata/delete_metadata SparseVector arms — checks maxscore_index_writer first, falls back to sparse_index_writer.
Metadata segment flusher (blockfile_metadata.rs):
- Changed sparse_index_flusher: SparseFlusher to Option<SparseFlusher> + Option<MaxScoreFlusher>.
- commit() handles both writer paths; flush() conditionally inserts SPARSE_POSTING or SPARSE_MAX+SPARSE_OFFSET_VALUE into the flushed file_path map.
Metadata segment reader (blockfile_metadata.rs):
- Added maxscore_index_reader: Option<MaxScoreReader> field to MetadataSegmentReaderShard.
- SPARSE_POSTING blockfile loaded concurrently in the existing tokio::join!. If present, maxscore_index_reader is populated and the old WAND reader is skipped.
Call site updates (~27 sites):
- 2 production orchestrators (log_fetch_orchestrator.rs, attached_function_orchestrator.rs) pass collection.schema.as_ref().
- create_new_shard extracts schema from &Collection.
- ~24 test sites pass None (backward-compatible — default WAND path).

Test plan

All existing metadata segment tests pass unchanged (they pass None for schema, so the WAND writer is created as before).
Compilation verified for chroma-segment and worker crates.
Integration tests for the MaxScore write-then-read path will be added in PR 8 alongside the operator and orchestrator routing.

Migration plan

No migration needed. Existing collections with SPARSE_MAX+SPARSE_OFFSET_VALUE in their file_path continue to use the WAND reader/writer. New collections only get the MaxScore index if the schema has algorithm: "max_score" (set by the frontend gating in PR 6). The segment reader auto-detects which format is present based on file_path keys.

Observability plan

No new metrics or spans. The 3-way branch in from_segment() is logged implicitly through existing tracing on blockfile open/create operations.

Documentation Changes

None.

github-actions · 2026-04-10T20:37:01Z

Sicheng-Pan · 2026-04-10T20:37:12Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

[ENH] Add maxscore index to metadata segment #6880 👈 (View in Graphite)
[ENH] Add maxscore option in schema #6878
[ENH] Benchmark maxscore #6866
[ENH] Add SIMD for maxscore #6865
[ENH] Add maxscore lazy cursor #6829
[ENH] Add basic maxscore writer/reader #6825
[ENH] Add SparsePostingBlock #6823
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

propel-code-bot · 2026-04-10T20:38:00Z

Wire MaxScore sparse index into metadata segment read/write/flush paths

This PR adds end-to-end metadata segment support for the new MaxScore sparse index format and selects it based on persisted segment files or collection schema. The metadata writer now supports both legacy WAND (SPARSE_MAX + SPARSE_OFFSET_VALUE) and MaxScore (SPARSE_POSTING) with mutually exclusive writer state, and the reader auto-detects which sparse format to load. Flushing and commit logic were updated to emit the correct file path keys for the chosen sparse index type.

The change also introduces MaxScoreReader::count_postings() for per-dimension posting counts, updates orchestrator call sites to pass collection.schema.as_ref() into metadata writer construction, and adds a substantial multi-commit consistency test that validates MaxScore behavior across add/delete/update cycles and query recall checks.

This summary was automatically generated by @propel-code-bot

propel-code-bot

Review found no issues; changes cleanly integrate MaxScore metadata indexing with backward-compatible reader/writer routing.

Status: No Issues Found | Risk: Low

Review Details

📁 7 files reviewed | 💬 0 comments

Sicheng-Pan mentioned this pull request Apr 10, 2026

[ENH] Add maxscore option in schema #6878

Open

This was referenced Apr 10, 2026

[ENH] Add SparsePostingBlock #6823

Open

[ENH] Add basic maxscore writer/reader #6825

Open

[ENH] Add maxscore lazy cursor #6829

Open

[ENH] Add SIMD for maxscore #6865

Open

[ENH] Benchmark maxscore #6866

Open

Sicheng-Pan changed the title ~~Wire MaxScore index into metadata segment writer, reader, and flusher~~ [ENH] Add maxscore index to metadata segment Apr 10, 2026

Sicheng-Pan marked this pull request as ready for review April 10, 2026 20:37

propel-code-bot bot reviewed Apr 10, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

Wire MaxScore index into metadata segment writer, reader, and flusher

294856f

Sicheng-Pan force-pushed the hammad/maxscore_segment_wiring branch from b8b2470 to 294856f Compare April 10, 2026 23:27

Test

3f7caf7

Sicheng-Pan force-pushed the hammad/maxscore_segment_wiring branch from ed6eeec to 3f7caf7 Compare April 11, 2026 00:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Add maxscore index to metadata segment#6880

[ENH] Add maxscore index to metadata segment#6880
Sicheng-Pan wants to merge 2 commits intohammad/maxscore_schema_gatingfrom
hammad/maxscore_segment_wiring

Sicheng-Pan commented Apr 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

Sicheng-Pan commented Apr 10, 2026 •

edited

Loading

Uh oh!

propel-code-bot bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

propel-code-bot bot left a comment

Uh oh!

This comment has been minimized.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Sicheng-Pan commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of changes

Test plan

Migration plan

Observability plan

Documentation Changes

Uh oh!

github-actions bot commented Apr 10, 2026

Reviewer Checklist

Testing, Bugs, Errors, Logs, Documentation

System Compatibility

Quality

Uh oh!

Sicheng-Pan commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

propel-code-bot bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

propel-code-bot bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sicheng-Pan commented Apr 10, 2026 •

edited

Loading

Sicheng-Pan commented Apr 10, 2026 •

edited

Loading

propel-code-bot bot commented Apr 10, 2026 •

edited

Loading