perf: Optimize NEXT_INST macro by reordering memory loads for better performance by quake · Pull Request #465 · nervosnetwork/ckb-vm

quake · 2025-03-10T12:23:44Z

A simple instruction reordering in the NEXT_INST macro may reduce memory access latency and improve instruction-level parallelism. This change improves performance by approximately 2% on my env (2 diferenet spec cpu) because of hot path.

CPU1

interpret secp256k1_bench via assembly
                        time:   [4.6390 ms 4.6434 ms 4.6492 ms]
                        change: [-2.7022% -2.5134% -2.3102%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

interpret secp256k1_bench via assembly mop
                        time:   [4.7361 ms 4.7442 ms 4.7534 ms]
                        change: [-2.0556% -1.7441% -1.4555%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

interpret secp256k1_bench via assembly mop (memoized decoder)
                        time:   [3.8854 ms 3.9218 ms 3.9775 ms]
                        change: [-1.7528% -0.8066% +0.7752%] (p = 0.21 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

interpret secp256k1_bench via assembly mop (memoized dynamic length decoder)
                        time:   [3.2486 ms 3.2510 ms 3.2537 ms]
                        change: [-2.6563% -2.4866% -2.3305%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe

CPU2

interpret secp256k1_bench via assembly
                        time:   [3.4991 ms 3.5017 ms 3.5046 ms]
                        change: [-2.0495% -1.8489% -1.6756%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) high mild
  14 (14.00%) high severe

interpret secp256k1_bench via assembly mop
                        time:   [3.4559 ms 3.4607 ms 3.4677 ms]
                        change: [-1.5877% -1.4303% -1.2230%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) high mild
  12 (12.00%) high severe

interpret secp256k1_bench via assembly mop (memoized decoder)
                        time:   [2.9029 ms 2.9051 ms 2.9077 ms]
                        change: [-1.6066% -1.5125% -1.4196%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) high mild
  8 (8.00%) high severe

interpret secp256k1_bench via assembly mop (memoized dynamic length decoder)
                        time:   [2.4625 ms 2.4633 ms 2.4642 ms]
                        change: [-2.3027% -2.2052% -2.1223%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

…performance

xxuejie

I remembered the code was written as the changes after this PR in the first place. Later in profiling, it was revealed that memory load from INST_ARGS and INST_PC cause enough cache misses, so the instructions are shuffled. Not sure if it is a good idea to move those instructions back again. I can only confirm that the correctness of the code has no issues.

mohanson · 2025-03-12T06:14:11Z

Confirmed that the performance has been significantly improved.

interpret secp256k1_bench via assembly
                        time:   [4.5208 ms 4.5343 ms 4.5484 ms]
                        change: [-1.9489% -1.4873% -1.0056%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

interpret secp256k1_bench via assembly mop
                        time:   [4.4476 ms 4.4651 ms 4.4878 ms]
                        change: [-2.4842% -1.9465% -1.3296%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

interpret secp256k1_bench via assembly mop (memoized decoder)
                        time:   [3.7501 ms 3.7695 ms 3.7943 ms]
                        change: [-4.0137% -3.3920% -2.7504%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

interpret secp256k1_bench via assembly mop (memoized dynamic length decoder)
                        time:   [3.1980 ms 3.2061 ms 3.2149 ms]
                        change: [-2.9800% -2.3961% -1.8728%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

perf: Optimize NEXT_INST macro by reordering memory loads for better …

79cc5e7

…performance

quake mentioned this pull request Mar 10, 2025

perf: Optimize NEXT_INST macro by reordering memory loads for better performance #466

Merged

quake requested review from mohanson and xxuejie March 12, 2025 05:35

xxuejie approved these changes Mar 12, 2025

View reviewed changes

mohanson approved these changes Mar 12, 2025

View reviewed changes

quake merged commit a1d4a9e into nervosnetwork:develop Mar 14, 2025
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Optimize NEXT_INST macro by reordering memory loads for better performance#465

perf: Optimize NEXT_INST macro by reordering memory loads for better performance#465
quake merged 1 commit intonervosnetwork:developfrom
quake:quake/next-inst-opt

quake commented Mar 10, 2025

Uh oh!

xxuejie left a comment

Uh oh!

mohanson commented Mar 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

quake commented Mar 10, 2025

Uh oh!

xxuejie left a comment

Choose a reason for hiding this comment

Uh oh!

mohanson commented Mar 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants