Variant of PR6672 (Use avx512 in move rank) by mstembera · Pull Request #6678 · official-stockfish/Stockfish

mstembera · 2026-03-19T19:09:14Z

https://tests.stockfishchess.org/tests/view/69b822411860ed703cccef9b
STC: LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 45440 W: 11919 L: 11601 D: 21920
Ptnml(0-2): 148, 4839, 12422, 5169, 142

Precompute a combined history score for every (piece, square) pair.

Variant of #6672 trying to make the intrinsics portion simpler and less co-mingled with the rest of the code. May address some of the concerns/discussion in PR6672. I leave it to the maintainers to decide if it's a worthwhile tradeoff. All credit to @AliceRoselia.

No functional change
bench: 2559757

github-actions · 2026-03-19T19:14:55Z

clang-format 20 needs to be run on this PR.
If you do not have clang-format installed, the maintainer will run it when merging.
For the exact version please see https://packages.ubuntu.com/plucky/clang-format-20.

(execution 24213874657 / attempt 1)

anematode · 2026-03-19T19:29:01Z

Honestly, it might make sense to tune it and use the scalar version when there's a small number of pieces on the board? If we're in an endgame we will be wasting a whole ton of effort

Still a bit concerned about usability for other folks... e.g. an experiment like https://tests.stockfishchess.org/tests/live_elo/69bbeee1d7d60419badf331e would require knowing how to implement the vectorized multiplication, or a custom base branch which is annoying

That said, it could also make such experiments easier to pass, depending on the worker mix...

mstembera · 2026-03-19T20:26:44Z

Agree. I don't want to spend too much effort and resources just yet till we know what the maintainers think about this patch.
[Edit] ok https://tests.stockfishchess.org/tests/view/69bd0a548c3d089c399d1af0 scheduled

coderabbitai · 2026-04-09T21:19:20Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a1f3bd3f-f287-482e-bc56-9eb950e75220

📥 Commits

Reviewing files that changed from the base of the PR and between d05b916 and bf3adec.

📒 Files selected for processing (2)

src/misc.h
src/movepick.cpp

✅ Files skipped from review due to trivial changes (1)

src/misc.h

🚧 Files skipped from review as they are similar to previous changes (1)

src/movepick.cpp

📝 Walkthrough

Walkthrough

MultiArray class declaration in src/misc.h now includes alignas(32). In src/movepick.cpp an AVX-512-only helper init_quiet_hist_buffer() was added to precompute combined pawn and continuation history scores into an aligned histBuffer; MovePicker::score() allocates and fills this buffer under USE_AVX512 and uses histBuffer[pt][to] when scoring quiet moves, leaving the non-AVX512 scalar path unchanged.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately references PR6672 and describes the main change—a variant approach to using AVX512 in move ranking with simplified intrinsics.
Description check	✅ Passed	The description clearly explains the purpose of the PR as a variant of PR6672, mentions precomputing history scores, notes no functional change, and provides test results and benchmarks.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/movepick.cpp (2)

164-179: ⚠️ Potential issue | 🟡 Minor

Fix clang-format violations flagged by CI.

The pipeline is failing due to clang-format violations in this block. The #if/#endif preprocessor directives and the indented code within the if constexpr block need to follow the project's formatting conventions.

Suggested fix for formatting

Run clang-format on this file or manually ensure:

Preprocessor directives (#if, #else, #endif) are not indented within function scope
Code within conditional blocks follows consistent indentation

     [[maybe_unused]] Bitboard threatByLesser[KING + 1];
-    `#if` defined(USE_AVX512)
-    [[maybe_unused]] alignas(64) int histBuffer[KING + 1][SQUARE_NB];
-    `#endif`
+#if defined(USE_AVX512)
+    [[maybe_unused]] alignas(64) int histBuffer[KING + 1][SQUARE_NB];
+#endif
     if constexpr (Type == QUIETS)
     {
         threatByLesser[PAWN]   = 0;
         threatByLesser[KNIGHT] = threatByLesser[BISHOP] = pos.attacks_by<PAWN>(~us);
         threatByLesser[ROOK] =
           pos.attacks_by<KNIGHT>(~us) | pos.attacks_by<BISHOP>(~us) | threatByLesser[KNIGHT];
         threatByLesser[QUEEN] = pos.attacks_by<ROOK>(~us) | threatByLesser[ROOK];
         threatByLesser[KING]  = 0;

-        `#if` defined(USE_AVX512)
-            init_quiet_hist_buffer(histBuffer, pos, continuationHistory, sharedHistory);
-        `#endif`
+#if defined(USE_AVX512)
+        init_quiet_hist_buffer(histBuffer, pos, continuationHistory, sharedHistory);
+#endif
     }

198-212: ⚠️ Potential issue | 🟡 Minor

Logic is correct; fix formatting for CI.

The AVX-512 and scalar paths are functionally equivalent:

Both compute 2 * mainHistory + 2 * pawnHistory + continuationHistory[0,1,2,3,5]
The precomputed histBuffer[pt][to] correctly maps piece types to values computed with the proper Piece index

The same clang-format issues apply here—preprocessor directives should not be indented:

Suggested formatting fix

             m.value = 2 * (*mainHistory)[us][m.raw()];

-            `#if` defined(USE_AVX512)
-                m.value += histBuffer[pt][to];
-            `#else`
-                m.value += 2 * sharedHistory->pawn_entry(pos)[pc][to];
-                m.value += (*continuationHistory[0])[pc][to];
-                m.value += (*continuationHistory[1])[pc][to];
-                m.value += (*continuationHistory[2])[pc][to];
-                m.value += (*continuationHistory[3])[pc][to];
-                m.value += (*continuationHistory[5])[pc][to];
-            `#endif`
+#if defined(USE_AVX512)
+            m.value += histBuffer[pt][to];
+#else
+            m.value += 2 * sharedHistory->pawn_entry(pos)[pc][to];
+            m.value += (*continuationHistory[0])[pc][to];
+            m.value += (*continuationHistory[1])[pc][to];
+            m.value += (*continuationHistory[2])[pc][to];
+            m.value += (*continuationHistory[3])[pc][to];
+            m.value += (*continuationHistory[5])[pc][to];
+#endif

🧹 Nitpick comments (1)

src/movepick.cpp (1)

135-150: Consider endgame fallback (echoing PR discussion).

As noted in the PR comments, the vectorized approach precomputes scores for all piece types regardless of how many pieces remain. In endgames with few pieces, this could waste work compared to the scalar path. A future optimization might conditionally bypass the vectorized path when the piece count is below a threshold.

This is not a blocking concern for this PR—just noting the architectural consideration that was already raised.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 69e6a567-6721-4912-93ed-c95b73f03ce2

📥 Commits

Reviewing files that changed from the base of the PR and between bb4eb04 and c06bb62.

📒 Files selected for processing (2)

src/misc.h
src/movepick.cpp

github-actions · 2026-05-06T19:07:28Z

clang-format 20 needs to be run on this PR.
If you do not have clang-format installed, the maintainer will run it when merging.
For the exact version please see https://packages.ubuntu.com/plucky/clang-format-20.

(execution 25455458115 / attempt 1)

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/movepick.cpp (1)

234-282: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Address the clang-format pipeline failures.

The Clang-Format CI job fails on lines 234, 236, 245, 247–248, and 271, 273–281 (the #if defined(USE_AVX512) guarded blocks and the surrounding code). Please run the project's clang-format target so the new preprocessor blocks match the existing style for indentation/spacing of #if/#else/#endif directives inside function bodies.

🧹 Nitpick comments (2)

src/movepick.cpp (1)

191-222: ⚡ Quick win

Document or refactor alignment assumptions in init_quiet_hist_buffer.

This function relies on three implicit alignment guarantees:

buf must be 64-byte aligned for _mm512_store_epi32 (line 218). The parameter type int buf[][SQUARE_NB] decays to a plain pointer, making this contract invisible.

&ph[sq] and &(*continuationHistory[j])[pc][sq] must be 32-byte aligned for _mm256_load_si256 (lines 213, 216), currently relying on alignas(32) in MultiArray (misc.h).

The code currently works correctly—histBuffer is declared alignas(64) at the call site—but the alignment requirements are implicit. If MultiArray's alignment is narrowed or removed, the loads silently become undefined behavior.

Two options:

Use unaligned intrinsics (_mm256_loadu_si256 / _mm512_storeu_epi32) to make the function independent of external alignment. Modern CPUs incur zero penalty on aligned data, eliminating the cross-file coupling with misc.h and reducing fragility.

Take buf as a reference-to-array (int (&buf)[KING + 1][SQUARE_NB]) so alignment information is preserved in the type system, and document the alignment requirement.

src/misc.h (1)

250-251: 🏗️ Heavy lift

Consider narrowing the scope of alignas(32) or using unaligned SIMD loads instead.

Adding alignas(32) to the MultiArray class template propagates to every instantiation in the codebase, at every nesting level. For nested arrays whose natural size isn't a multiple of 32 (e.g., inner MultiArray<int16_t, N> rows < 32 bytes), the compiler must pad each element so sizeof is a multiple of 32, which can noticeably grow memory for tables that have nothing to do with the init_quiet_hist_buffer AVX-512 optimization.

Two less invasive alternatives are worth considering:

Apply alignment surgically — only the histories actually consumed by init_quiet_hist_buffer (PawnHistory and the PieceToHistory rows reached via ContinuationHistory) need 32-byte alignment. A typed alias or wrapper for those, or explicit alignas on the specific field/typedef, avoids touching all MultiArray users.

Drop the alignment requirement entirely and use _mm256_loadu_si256 (and _mm512_storeu_epi32) in movepick.cpp. On Skylake and later, vmovdqu has identical performance to vmovdqa on naturally aligned addresses, so the SIMD fast path is preserved without paying a memory cost for unrelated tables.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 21d224e8-5b0e-4cde-a41b-3cd4c8174ef2

📥 Commits

Reviewing files that changed from the base of the PR and between c06bb62 and 6d07f13.

📒 Files selected for processing (2)

src/misc.h
src/movepick.cpp

github-actions · 2026-05-07T20:37:07Z

clang-format 20 needs to be run on this PR.
If you do not have clang-format installed, the maintainer will run it when merging.
For the exact version please see https://packages.ubuntu.com/plucky/clang-format-20.

(execution 25520695615 / attempt 1)

mstembera · 2026-05-07T20:44:28Z

@vondele What do u think about this especially in light of the #6768 commit. I can also retest since it's been a while for the original test.

github-actions · 2026-05-07T20:47:39Z

clang-format 20 needs to be run on this PR.
If you do not have clang-format installed, the maintainer will run it when merging.
For the exact version please see https://packages.ubuntu.com/plucky/clang-format-20.

(execution 25521121574 / attempt 1)

anematode · 2026-05-08T03:55:59Z

Hm. Maybe there's some stuff we can do here with templates or operator overloading to make it really easy for folks to work on this part of the code. The painful part is that the avx512 code path hoists everything out of the loop, so it'll probably require some ceremony...

Like we could have something looking like

PtToHistories pcToHists = 2 * PtToHistory(sharedEntry->pawn_entry(pos)) + PtToHistory(continuationHistory[0]) + ...;

and then in the loop we'd just do

m.value += pcToHists[pc][to];

The implementation would be a bit thorny, but manageable I think

No functional change bench: 2559757

github-actions · 2026-05-17T19:52:19Z

clang-format 20 needs to be run on this PR.
If you do not have clang-format installed, the maintainer will run it when merging.
For the exact version please see https://packages.ubuntu.com/plucky/clang-format-20.

(execution 26001017050 / attempt 1)

mstembera changed the title ~~Variant of PR6672~~ Variant of PR6672 (Use avx512 in move rank) Mar 19, 2026

mstembera force-pushed the PR6672_Variant branch 2 times, most recently from d9a669a to c06bb62 Compare April 9, 2026 21:19

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

mstembera force-pushed the PR6672_Variant branch from c06bb62 to 6d07f13 Compare May 6, 2026 19:07

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

mstembera force-pushed the PR6672_Variant branch from 6d07f13 to 8ada77a Compare May 7, 2026 20:36

mstembera force-pushed the PR6672_Variant branch from 8ada77a to d05b916 Compare May 7, 2026 20:45

Variant of PR6672

bf3adec

No functional change bench: 2559757

mstembera force-pushed the PR6672_Variant branch from d05b916 to bf3adec Compare May 17, 2026 19:52

Conversation

mstembera commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anematode commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mstembera commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

mstembera commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

anematode commented May 8, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mstembera commented Mar 19, 2026 •

edited

Loading

github-actions Bot commented Mar 19, 2026 •

edited

Loading

anematode commented Mar 19, 2026 •

edited

Loading

mstembera commented Mar 19, 2026 •

edited

Loading

coderabbitai Bot commented Apr 9, 2026 •

edited

Loading