[ENH] Add SIMD for maxscore#6865
[ENH] Add SIMD for maxscore#6865Sicheng-Pan wants to merge 1 commit intohammad/maxscore_lazy_cursorfrom
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
Add SIMD Acceleration for This PR introduces SIMD-accelerated implementations for two hot paths in the Rust sparse query engine: candidate budget pruning in In This summary was automatically generated by @propel-code-bot |
6bbf6e7 to
d47e07c
Compare
b3e6c5a to
fcca860
Compare
fcca860 to
da68907
Compare
6920d76 to
0e9eaa3
Compare
da68907 to
59904b4
Compare
59904b4 to
c76ce23
Compare
0e9eaa3 to
ac08b0c
Compare
ac08b0c to
014e4ed
Compare
c76ce23 to
4b94ecf
Compare

Description of changes
This is PR #4 of the BlockMaxMaxScore series, stacked on
hammad/maxscore_lazy_cursor. It adds SIMD acceleration for the two hottest scalar paths in the query engine.sparse_posting_block.rs): Replaces the scalarconvert_f16_to_f32with a platform-dispatched implementation:vld1q_u16/vmovl_u16/vreinterpretq_f32_u32. Handles all normal f16 values; subnormals map to tiny positive values (acceptable for SPLADE/BM25 weights)._mm256_cvtph_psintrinsic, 8 values per iteration. Runtime-detected viais_x86_feature_detected!("f16c").halfcrate conversion for unsupported platforms.decompress_values_into()— the bulk value decompression path for Eager cursors andensure_forward_block.filter_competitive(maxscore.rs): Replaces the scalar budget-pruning compaction with SIMD-accelerated 4-wide comparison:vcgtq_f32comparison, per-lane mask extraction, branchless scatter of survivors._mm_cmpgt_ps+_mm_movemask_ps, bit-scan scatter. Runtime-detected.Test plan
convert_f16_simd_matches_scalar— Verifies SIMD f16→f32 output matches scalar at 14 different sizes (1, 3, 7, 8, 9, 15, 16, 17, 31, 63, 64, 100, 256, 1000) including remainder paths.filter_competitive_simd_matches_scalar— Verifies SIMD filter output matches scalar at 11 sizes including remainder paths.filter_competitive_all_pass/filter_competitive_none_pass— Edge cases for budget pruning.cargo testMigration plan
No migration needed. This is a drop-in performance optimization with no format or API changes.
Observability plan
No new instrumentation needed.
Documentation Changes
No user-facing API changes. SAFETY comments added to all
unsafeSIMD blocks.