Skip to content

[ENH] Benchmark maxscore#6866

Open
Sicheng-Pan wants to merge 2 commits intohammad/maxscore_simdfrom
hammad/maxscore_benchmark
Open

[ENH] Benchmark maxscore#6866
Sicheng-Pan wants to merge 2 commits intohammad/maxscore_simdfrom
hammad/maxscore_benchmark

Conversation

@Sicheng-Pan
Copy link
Copy Markdown
Contributor

@Sicheng-Pan Sicheng-Pan commented Apr 10, 2026

Description of changes

This is PR #5 of the BlockMaxMaxScore series, stacked on hammad/maxscore_simd. It extends the existing Wikipedia SPLADE sparse vector benchmark to support BlockMaxMaxScore as an alternative to the existing Block-Max WAND algorithm.

  • New functionality
    • --block-maxscore flag: Builds and searches using the BlockMaxMaxScore index instead of WAND.
    • --sweep-terms mode: Builds both WAND and MaxScore indices, then runs queries at 5, 10, 15, ..., 40 max terms, printing a side-by-side latency comparison table with speedup ratios.
    • --max-terms <N>: Truncates each query to its top-N highest-weight terms before searching. Useful for studying the relationship between query complexity and latency.
    • --batch-size <N>: Configurable commit/flush batch size during indexing (default 65536).
    • --block-size default changed from 128 to 256 entries per posting block. Shared between WAND and MaxScore paths.
    • build_block_maxscore_index: Parallel document ingestion into BlockMaxMaxScore index with incremental fork-commit-flush loop, progress bar, and storage size measurement.
    • search_with_block_maxscore: Query loop with per-query timing, iteration support, and progress bar.
    • Dataset recycling: When --num-documents exceeds the dataset size, documents are recycled with unique IDs to reach the requested count.
    • Query term statistics: Prints min/median/avg/max query term counts on startup.
    • Storage size reporting: Both WAND and MaxScore index builds report on-disk storage size.
    • run_brute_force helper: Extracted from duplicated inline brute-force code for reuse by both algorithm paths.
  • Dataset change
    • wikipedia_splade.rs: Downloads all 7 train shards (~1M documents) instead of just the first shard (~142K), enabling benchmarks at larger scale.

Test plan

This is a benchmark binary, not a library — no unit tests. Verified manually:

  • --block-maxscore -n 65536 -m 256 -k 128: 100% recall, ~20x speedup over brute force
  • Default WAND mode still works unchanged
  • --sweep-terms prints comparison table
  • --max-terms 20 truncates queries correctly
  • --wand-only and --block-maxscore --wand-only profiling modes work

Migration plan

No migration needed. This only changes a benchmark example binary and a benchmark dataset loader.

Observability plan

No instrumentation changes. The benchmark itself prints detailed timing and recall metrics.

Documentation Changes

No user-facing API changes. CLI usage is documented in the file's module-level doc comment.

Copy link
Copy Markdown
Contributor Author

Sicheng-Pan commented Apr 10, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link
Copy Markdown

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@Sicheng-Pan Sicheng-Pan marked this pull request as ready for review April 10, 2026 03:12
@Sicheng-Pan Sicheng-Pan changed the title Add BlockMaxMaxScore mode to sparse vector benchmark [ENH] Benchmark maxscore Apr 10, 2026
@propel-code-bot
Copy link
Copy Markdown
Contributor

propel-code-bot bot commented Apr 10, 2026

Add BlockMaxMaxScore benchmarking mode and term-sweep analysis to sparse benchmark

This PR significantly expands the benchmark example at rust/index/examples/sparse_vector_benchmark.rs to support comparing two sparse retrieval algorithms: existing Block-Max WAND and new BlockMaxMaxScore. It introduces new CLI options for algorithm selection, query truncation (--max-terms), cross-algorithm term sweep (--sweep-terms), and configurable indexing batch size (--batch-size), while preserving the existing WAND workflow.

It also scales the dataset loader in rust/benchmark/src/datasets/wikipedia_splade.rs from a single train shard to all 7 train shards, enabling larger benchmark runs. Additional benchmark ergonomics include storage size reporting, query-term statistics, reusable brute-force baseline logic, and dataset recycling when requested document count exceeds available data.

This summary was automatically generated by @propel-code-bot

@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_benchmark branch from e874e94 to 460583d Compare April 10, 2026 03:13
Copy link
Copy Markdown
Contributor

@propel-code-bot propel-code-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One important logic issue was found that can crash benchmark execution on empty datasets and should be guarded before release.

Status: Changes Suggested | Risk: Medium

Issues Identified & Suggestions
  • Prevent divide-by-zero panic on empty documents; add early error guard: rust/index/examples/sparse_vector_benchmark.rs
Review Details

📁 2 files reviewed | 💬 1 comments

👍 / 👎 individual comments to help improve reviews for you

} else {
// Dataset has fewer documents than requested — recycle with unique IDs.
let base_len = raw_documents.len();
let repeats = args.num_documents.div_ceil(base_len);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] This can panic when no documents are returned: base_len can be 0, then args.num_documents.div_ceil(base_len) divides by zero. That makes the new document-recycling path crash on empty/failed datasets.

Add an explicit guard for raw_documents.is_empty() and return an error with context before computing repeats.

Context for Agents
This can panic when no documents are returned: `base_len` can be 0, then `args.num_documents.div_ceil(base_len)` divides by zero. That makes the new document-recycling path crash on empty/failed datasets.

Add an explicit guard for `raw_documents.is_empty()` and return an error with context before computing `repeats`.

File: rust/index/examples/sparse_vector_benchmark.rs
Line: 1014

@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_simd branch from d47e07c to 6920d76 Compare April 10, 2026 03:26
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_benchmark branch from 460583d to 1888806 Compare April 10, 2026 03:26
propel-code-bot[bot]

This comment was marked as outdated.

@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_benchmark branch from 1888806 to 5bc0451 Compare April 10, 2026 17:18
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_simd branch from 6920d76 to 0e9eaa3 Compare April 10, 2026 17:18
Copy link
Copy Markdown
Contributor

@propel-code-bot propel-code-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important reliability and logic issues were found that can cause runtime panics in new benchmark paths and should be fixed before merge.

Status: Changes Suggested | Risk: Medium

Issues Identified & Suggestions
  • Validate batch size >0 to prevent divide/chunk panics: rust/index/examples/sparse_vector_benchmark.rs
  • Replace task join unwraps with contextual error propagation: rust/index/examples/sparse_vector_benchmark.rs
Review Details

📁 2 files reviewed | 💬 2 comments

👍 / 👎 individual comments to help improve reviews for you


// Process documents in batches with write-commit-flush loop
let batch_size = 65536;
let batch_size = batch_size_override;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] batch_size_override is used directly in div_ceil and chunks. If --batch-size 0 is passed, this will panic at runtime (division by zero / invalid chunk size).

Add CLI validation (value_parser with range 1..) or an explicit guard returning an error when batch_size_override == 0.

Context for Agents
`batch_size_override` is used directly in `div_ceil` and `chunks`. If `--batch-size 0` is passed, this will panic at runtime (division by zero / invalid chunk size).

Add CLI validation (`value_parser` with range `1..`) or an explicit guard returning an error when `batch_size_override == 0`.

File: rust/index/examples/sparse_vector_benchmark.rs
Line: 308

})
.collect::<Vec<_>>();
for handle in handles {
handle.await.unwrap();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Reliability] handle.await.unwrap() will panic if any spawned indexing task panics/cancels, aborting the entire benchmark without context. This is now on the newly added MaxScore path.

Propagate join failures as anyhow errors with context instead of unwrapping, so failures are diagnosable and don't crash abruptly.

Context for Agents
`handle.await.unwrap()` will panic if any spawned indexing task panics/cancels, aborting the entire benchmark without context. This is now on the newly added MaxScore path.

Propagate join failures as `anyhow` errors with context instead of unwrapping, so failures are diagnosable and don't crash abruptly.

File: rust/index/examples/sparse_vector_benchmark.rs
Line: 669

Copy link
Copy Markdown
Contributor

@propel-code-bot propel-code-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One important logic issue was identified: an edge-case panic should be replaced with recoverable error handling before approval.

Status: Changes Suggested | Risk: Medium

Issues Identified & Suggestions
  • Replace empty-documents panic with anyhow recoverable error path: rust/index/examples/sparse_vector_benchmark.rs
Review Details

📁 2 files reviewed | 💬 1 comments

👍 / 👎 individual comments to help improve reviews for you

Ok((
temp_dir,
provider,
posting_writer_id.expect("Should have created at least one writer"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] posting_writer_id.expect("Should have created at least one writer") will panic when documents is empty (no chunks processed). That's a realistic edge case for small/failed dataset loads.

Return a proper anyhow error when no writer is created instead of panicking, so callers get a recoverable failure path.

Context for Agents
`posting_writer_id.expect("Should have created at least one writer")` will panic when `documents` is empty (no chunks processed). That's a realistic edge case for small/failed dataset loads.

Return a proper `anyhow` error when no writer is created instead of panicking, so callers get a recoverable failure path.

File: rust/index/examples/sparse_vector_benchmark.rs
Line: 711

@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_benchmark branch from 946139b to c800ddb Compare April 10, 2026 20:36
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_simd branch from ac08b0c to 014e4ed Compare April 10, 2026 20:36
Copy link
Copy Markdown
Contributor

@propel-code-bot propel-code-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review found no issues; changes appear well-scoped and benchmark-focused with clear manual validation.

Status: No Issues Found | Risk: Low

Review Details

📁 2 files reviewed | 💬 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant