perf: Eliminate stack operations in division/remainder handlers #490
Closed
bubblepipe wants to merge 1 commit intonervosnetwork:developfrom
Closed
perf: Eliminate stack operations in division/remainder handlers #490bubblepipe wants to merge 1 commit intonervosnetwork:developfrom
bubblepipe wants to merge 1 commit intonervosnetwork:developfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes division and remainder operations in the x64 assembly JIT by replacing stack-based register preservation with register-to-register moves. The changes eliminate push/pop operations in favor of using TEMP3 (%r11) as a scratch register to temporarily store the RD register (%rax).
- Replaced
push RD/pop RDwithmovq RD, TEMP3/movq TEMP3, RDacross all division and remainder operations - Removed redundant
PUSH_RD_IF_RAX/PUSH_RD_IF_RDX/POP_RD_IF_RAX/POP_RD_IF_RDXmacro calls in division operations - Applied changes consistently to DIV, DIVU, DIVUW, DIVW, REM, REMU, REMUW, REMW, WIDE_DIV, and WIDE_DIVU operations
Comments suppressed due to low confidence (6)
src/machine/asm/execute_x64.S:900
- The MULH operation still uses PUSH_RD_IF_RAX/POP_RD_IF_RAX macros for register preservation while division/remainder operations have been optimized to use TEMP3. For consistency and performance, consider replacing these with
movq RD, TEMP3before line 896 andmovq TEMP3, RDafter line 898, similar to the division operations.
PUSH_RD_IF_RAX
PUSH_RD_IF_RDX
movq REGISTER_ADDRESS(RS1), %rax
imulq REGISTER_ADDRESS(RS2r)
MOV_RDX_TO_RS2r
POP_RD_IF_RDX
POP_RD_IF_RAX
src/machine/asm/execute_x64.S:907
- The MULHSU operation still uses PUSH_RD_IF_RAX/POP_RD_IF_RAX macros. For consistency with the division/remainder optimizations in this PR, consider replacing these with
movq RD, TEMP3and the correspondingmovq TEMP3, RDrestoration.
PUSH_RD_IF_RAX
PUSH_RD_IF_RDX
src/machine/asm/execute_x64.S:942
- The MULHU operation still uses PUSH_RD_IF_RAX/POP_RD_IF_RAX macros. For consistency with the division/remainder optimizations in this PR, consider replacing these with
movq RD, TEMP3and the correspondingmovq TEMP3, RDrestoration.
PUSH_RD_IF_RAX
PUSH_RD_IF_RDX
src/machine/asm/execute_x64.S:2170
- The WIDE_MUL operation still uses PUSH_RD_IF_RAX/POP_RD_IF_RAX macros. For consistency with the division/remainder optimizations in this PR, consider replacing these with
movq RD, TEMP3and the correspondingmovq TEMP3, RDrestoration.
PUSH_RD_IF_RAX
PUSH_RD_IF_RDX
src/machine/asm/execute_x64.S:2184
- The WIDE_MULU operation still uses PUSH_RD_IF_RAX/POP_RD_IF_RAX macros. For consistency with the division/remainder optimizations in this PR, consider replacing these with
movq RD, TEMP3and the correspondingmovq TEMP3, RDrestoration.
PUSH_RD_IF_RAX
PUSH_RD_IF_RDX
src/machine/asm/execute_x64.S:2198
- The WIDE_MULSU operation still uses PUSH_RD_IF_RAX/POP_RD_IF_RAX macros. For consistency with the division/remainder optimizations in this PR, consider replacing these with
movq RD, TEMP3and the correspondingmovq TEMP3, RDrestoration.
PUSH_RD_IF_RAX
PUSH_RD_IF_RDX
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR optimizes CKB-VM's x86-64 assembly implementation of division and remainder instructions by replacing stack operations (PUSH/POP) with register moves.
This optimization is not visible in the built-in
cargo benchdue to measurement noise, but static analysis with OSACA indicates 10x reduction in loop-carried dependency, and profiling on an AMD 5950x using uProf reports 95% less memory subsystem workload.