Skip to content

[ENH] Add SIMD for maxscore#6865

Open
Sicheng-Pan wants to merge 1 commit intohammad/maxscore_lazy_cursorfrom
hammad/maxscore_simd
Open

[ENH] Add SIMD for maxscore#6865
Sicheng-Pan wants to merge 1 commit intohammad/maxscore_lazy_cursorfrom
hammad/maxscore_simd

Conversation

@Sicheng-Pan
Copy link
Copy Markdown
Contributor

@Sicheng-Pan Sicheng-Pan commented Apr 10, 2026

Description of changes

This is PR #4 of the BlockMaxMaxScore series, stacked on hammad/maxscore_lazy_cursor. It adds SIMD acceleration for the two hottest scalar paths in the query engine.

  • New functionality
    • SIMD f16→f32 bulk conversion (sparse_posting_block.rs): Replaces the scalar convert_f16_to_f32 with a platform-dispatched implementation:
      • aarch64 NEON: Bit-manipulation approach (shift+mask+bias) processing 8 values per iteration via vld1q_u16 / vmovl_u16 / vreinterpretq_f32_u32. Handles all normal f16 values; subnormals map to tiny positive values (acceptable for SPLADE/BM25 weights).
      • x86_64 F16C: _mm256_cvtph_ps intrinsic, 8 values per iteration. Runtime-detected via is_x86_feature_detected!("f16c").
      • Scalar fallback: Unchanged half crate conversion for unsupported platforms.
      • Used by decompress_values_into() — the bulk value decompression path for Eager cursors and ensure_forward_block.
    • SIMD filter_competitive (maxscore.rs): Replaces the scalar budget-pruning compaction with SIMD-accelerated 4-wide comparison:
      • aarch64 NEON: vcgtq_f32 comparison, per-lane mask extraction, branchless scatter of survivors.
      • x86_64 SSE2: _mm_cmpgt_ps + _mm_movemask_ps, bit-scan scatter. Runtime-detected.
      • Scalar fallback: Unchanged loop for unsupported platforms.
      • Both handle remainder elements (not multiple of 4) with scalar tail.

Test plan

  • convert_f16_simd_matches_scalar — Verifies SIMD f16→f32 output matches scalar at 14 different sizes (1, 3, 7, 8, 9, 15, 16, 17, 31, 63, 64, 100, 256, 1000) including remainder paths.
  • filter_competitive_simd_matches_scalar — Verifies SIMD filter output matches scalar at 11 sizes including remainder paths.
  • filter_competitive_all_pass / filter_competitive_none_pass — Edge cases for budget pruning.
  • All existing roundtrip and recall tests exercise the SIMD paths transparently (dispatch is automatic).
  • Tests pass locally with cargo test

Migration plan

No migration needed. This is a drop-in performance optimization with no format or API changes.

Observability plan

No new instrumentation needed.

Documentation Changes

No user-facing API changes. SAFETY comments added to all unsafe SIMD blocks.

@github-actions
Copy link
Copy Markdown

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@Sicheng-Pan Sicheng-Pan mentioned this pull request Apr 10, 2026
2 tasks
@Sicheng-Pan Sicheng-Pan marked this pull request as ready for review April 10, 2026 01:44
Copy link
Copy Markdown
Contributor Author

Sicheng-Pan commented Apr 10, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@propel-code-bot
Copy link
Copy Markdown
Contributor

propel-code-bot bot commented Apr 10, 2026

Add SIMD Acceleration for maxscore Candidate Filtering and f16 Value Decompression

This PR introduces SIMD-accelerated implementations for two hot paths in the Rust sparse query engine: candidate budget pruning in maxscore and bulk f16f32 conversion in sparse posting block decompression. The changes add architecture-specific fast paths for x86_64 and aarch64, with runtime feature detection on x86_64 and scalar fallbacks preserved for unsupported platforms.

In rust/index/src/sparse/maxscore.rs, filter_competitive now dispatches to SIMD implementations (SSE2 on x86_64, NEON on aarch64) while retaining the original scalar logic in filter_competitive_scalar. In rust/types/src/sparse_posting_block.rs, convert_f16_to_f32 now dispatches to F16C (x86_64) or NEON (aarch64) conversion routines, with remainder handling and scalar fallback maintained. New tests validate SIMD/scalar equivalence across multiple non-aligned sizes and edge cases.

This summary was automatically generated by @propel-code-bot

@Sicheng-Pan Sicheng-Pan changed the title Add SIMD f16→f32 conversion (NEON/F16C) and SIMD filter_competitive (NEON/SSE2) [ENH] Add SIMD for maxscore Apr 10, 2026
Copy link
Copy Markdown
Contributor

@propel-code-bot propel-code-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues were found; the SIMD enhancements appear correct, well-tested, and low risk.

Status: No Issues Found | Risk: Low

Review Details

📁 2 files reviewed | 💬 0 comments

@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_simd branch from 6bbf6e7 to d47e07c Compare April 10, 2026 03:11
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_lazy_cursor branch from b3e6c5a to fcca860 Compare April 10, 2026 03:11
@Sicheng-Pan Sicheng-Pan mentioned this pull request Apr 10, 2026
5 tasks
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_lazy_cursor branch from fcca860 to da68907 Compare April 10, 2026 03:26
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_simd branch 2 times, most recently from 6920d76 to 0e9eaa3 Compare April 10, 2026 17:18
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_lazy_cursor branch from da68907 to 59904b4 Compare April 10, 2026 17:18
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_lazy_cursor branch from 59904b4 to c76ce23 Compare April 10, 2026 20:11
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_simd branch from 0e9eaa3 to ac08b0c Compare April 10, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant