Skip to content

Conversation

@qwang98
Copy link
Collaborator

@qwang98 qwang98 commented Nov 19, 2025

Powdr part ready for review (though OVM is not ready as I still need to consolidate APIs).

Depends on OVM branch: powdr-labs/openvm#50
Depends on Stark-Backend branch: powdr-labs/stark-backend#18

@qwang98
Copy link
Collaborator Author

qwang98 commented Dec 4, 2025

Direct to APC Fibo:

                      filename  num_segments  app_proof_cells  app_proof_cols  total_proof_time_ms  app_proof_time_ms  app_execute_preflight_time_ms  app_execute_metered_time_ms  app_trace_gen_time_ms  leaf_proof_time_ms  inner_recursion_proof_time_ms  normal_instruction_ratio  openvm_precompile_ratio  powdr_ratio  powdr_rows
/home/steve/powdr/metrics.json             1          8009664            1221                   63                 63                              2                            0                    280                   0                              0                   0.00124                 0.986359     0.012401        1024

Direct to APC Keccak:

                      filename  num_segments  app_proof_cells  app_proof_cols  total_proof_time_ms  app_proof_time_ms  app_execute_preflight_time_ms  app_execute_metered_time_ms  app_trace_gen_time_ms  leaf_proof_time_ms  inner_recursion_proof_time_ms  normal_instruction_ratio  openvm_precompile_ratio  powdr_ratio  powdr_rows
/home/steve/powdr/metrics.json             1          8449287            6704                  139                139                              3                            0                    274                   0                              0                   0.01803                 0.961151     0.020819          32

@qwang98
Copy link
Collaborator Author

qwang98 commented Dec 5, 2025

Using the 10,000 Keccak pass:

  • Original: (1608 + 1689 + 1623 + 1653 + 1616 + 1647) / 6 = 1639.33
  • Direct to APC: (1583 + 1561 + 1615 + 1579 + 1608 + 1546) / 6 = 1582
  • Roughly a savings of 3.5%

Some thoughts:

  • Note that the 3.5% savings is only with the ALU chip, which accounts for 318 out of 676 instructions in the sampled APC, so I'd say the actual savings of the current version should be ~7% once all chips are implemented.
  • Also note that 7% savings is on the whole trace gen, while the actual savings on APC per se will be greater than that, because non-APC trace gen isn't changed at all. This said, Keccak also represents a "some what special" example, as I'd assume its APC (the compression function) takes a larger portion of all execution traces compared to APCs do in a more "realistic" example like Reth, so the overall savings on a real example on overall trace gen could be smaller than 7% as well.
  • Might be larger gain after we imitate the "compile time constants"?
  • I'd also raise the fact that trace gen doesn't affect overall proving time that much, at least on the Hetzner machine (it's only about 1/5 of the app proof time), but given its sequential nature, on a beefier machine it might account for a greater percentage of the proving (as the proving parts are more parallelized).

Direct to APC numbers:

                             filename  num_segments  app_proof_cells  app_proof_cols  total_proof_time_ms  app_proof_time_ms  app_execute_preflight_time_ms  app_execute_metered_time_ms  app_trace_gen_time_ms  leaf_proof_time_ms  inner_recursion_proof_time_ms  normal_instruction_ratio  openvm_precompile_ratio  powdr_ratio  powdr_rows
  /home/steve/powdr/keccak_100_1.json             1         40875092            6704                  206                206                             55                            6                    228                   0                              0                  0.250479                 0.198680     0.550842        4096
  /home/steve/powdr/keccak_100_2.json             1         40875092            6704                  205                205                             54                            6                    234                   0                              0                  0.250479                 0.198680     0.550842        4096
  /home/steve/powdr/keccak_100_3.json             1         40875092            6704                  227                227                             53                            6                    233                   0                              0                  0.250479                 0.198680     0.550842        4096
 /home/steve/powdr/keccak_1000_1.json             1        341980884            6704                  685                685                            510                           63                    152                   0                              0                  0.449540                 0.023747     0.526713       32768
 /home/steve/powdr/keccak_1000_2.json             1        341980884            6704                  678                678                            512                           63                    149                   0                              0                  0.449540                 0.023747     0.526713       32768
 /home/steve/powdr/keccak_1000_3.json             1        341980884            6704                  684                684                            511                           64                    151                   0                              0                  0.449540                 0.023747     0.526713       32768
/home/steve/powdr/keccak_10000_1.json             4       2760620370           26534                 5311               5311                           5035                          636                   1583                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_10000_2.json             4       2760620370           26534                 5345               5345                           4977                          647                   1561                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_10000_3.json             4       2760620370           26534                 5333               5333                           5062                          632                   1615                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_10000_4.json             4       2760620370           26534                 5304               5304                           5049                          634                   1579                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_10000_5.json             4       2760620370           26534                 5324               5324                           5069                          632                   1608                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_10000_6.json             4       2760620370           26534                 5314               5314                           5017                          641                   1546                   0                              0                  0.466226                 0.011788     0.521986      262144

Original numbers:

                                      filename  num_segments  app_proof_cells  app_proof_cols  total_proof_time_ms  app_proof_time_ms  app_execute_preflight_time_ms  app_execute_metered_time_ms  app_trace_gen_time_ms  leaf_proof_time_ms  inner_recursion_proof_time_ms  normal_instruction_ratio  openvm_precompile_ratio  powdr_ratio  powdr_rows
  /home/steve/powdr/keccak_original_100_1.json             1         40875092            6704                  196                196                             56                            6                    230                   0                              0                  0.250479                 0.198680     0.550842        4096
  /home/steve/powdr/keccak_original_100_2.json             1         40875092            6704                  199                199                             55                            6                    246                   0                              0                  0.250479                 0.198680     0.550842        4096
  /home/steve/powdr/keccak_original_100_3.json             1         40875092            6704                  195                195                             54                            6                    243                   0                              0                  0.250479                 0.198680     0.550842        4096
 /home/steve/powdr/keccak_original_1000_1.json             1        341980884            6704                  686                686                            518                           63                    149                   0                              0                  0.449540                 0.023747     0.526713       32768
 /home/steve/powdr/keccak_original_1000_2.json             1        341980884            6704                  675                675                            521                           63                    158                   0                              0                  0.449540                 0.023747     0.526713       32768
 /home/steve/powdr/keccak_original_1000_3.json             1        341980884            6704                  673                673                            520                           64                    153                   0                              0                  0.449540                 0.023747     0.526713       32768
/home/steve/powdr/keccak_original_10000_1.json             4       2760620370           26534                 5336               5336                           5164                          633                   1608                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_original_10000_2.json             4       2760620370           26534                 5358               5358                           5181                          643                   1689                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_original_10000_3.json             4       2760620370           26534                 5312               5312                           5152                          646                   1623                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_original_10000_4.json             4       2760620370           26534                 5314               5314                           5163                          645                   1653                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_original_10000_5.json             4       2760620370           26534                 5323               5323                           5185                          648                   1616                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_original_10000_6.json             4       2760620370           26534                 5312               5312                           5583                          642                   1647                   0                              0                  0.466226                 0.011788     0.521986      262144

Sample command run: cargo run --bin powdr_openvm -r --features metrics,cuda prove "guest-keccak" --input 10000 --autoprecompiles 1 --skip 0 --metrics "keccak_original_10000_3.json"

@qwang98
Copy link
Collaborator Author

qwang98 commented Dec 15, 2025

                                         filename  num_segments  app_proof_cells  app_proof_cols  total_proof_time_ms  app_proof_time_ms  app_execute_preflight_time_ms  app_execute_metered_time_ms  app_trace_gen_time_ms  leaf_proof_time_ms  inner_recursion_proof_time_ms  normal_instruction_ratio  openvm_precompile_ratio  powdr_ratio  powdr_rows
  /home/steve/powdr/keccak_10000_skip_record.json             4       2760620370           26534                 5335               5335                           5207                          648                   1359                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_10000_skip_record_1.json             4       2760620370           26534                 5322               5322                           5246                          653                   1376                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_10000_skip_record_2.json             4       2760620370           26534                 5320               5320                           5252                          659                   1358                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_10000_skip_record_3.json             4       2760620370           26534                 5324               5324                           5172                          645                   1336                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_10000_skip_record_4.json             4       2760620370           26534                 5306               5306                           5159                          641                   1349                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_10000_skip_record_5.json             4       2760620370           26534                 5293               5293                           5186                          642                   1367                   0                              0                  0.466226                 0.011788     0.521986      262144

@qwang98
Copy link
Collaborator Author

qwang98 commented Dec 22, 2025

Skip record only: (1359 + 1376 + 1358 + 1336 + 1349 + 1367) / 6 = 1357.5

                                         filename  num_segments  app_proof_cells  app_proof_cols  total_proof_time_ms  app_proof_time_ms  app_execute_preflight_time_ms  app_execute_metered_time_ms  app_trace_gen_time_ms  leaf_proof_time_ms  inner_recursion_proof_time_ms  normal_instruction_ratio  openvm_precompile_ratio  powdr_ratio  powdr_rows
  /home/steve/powdr/keccak_10000_skip_record.json             4       2760620370           26534                 5335               5335                           5207                          648                   1359                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_10000_skip_record_1.json             4       2760620370           26534                 5322               5322                           5246                          653                   1376                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_10000_skip_record_2.json             4       2760620370           26534                 5320               5320                           5252                          659                   1358                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_10000_skip_record_3.json             4       2760620370           26534                 5324               5324                           5172                          645                   1336                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_10000_skip_record_4.json             4       2760620370           26534                 5306               5306                           5159                          641                   1349                   0                              0                  0.466226                 0.011788     0.521986      262144
/home/steve/powdr/keccak_10000_skip_record_5.json             4       2760620370           26534                 5293               5293                           5186                          642                   1367                   0                              0                  0.466226                 0.011788     0.521986      262144

Skip record + direct to APC for all Keccak chips (4 chips): (1242+1200+1211+1235+1215) / 5 = 1220.6

                                 filename  num_segments  app_proof_cells  app_proof_cols  total_proof_time_ms  app_proof_time_ms  app_execute_preflight_time_ms  app_execute_metered_time_ms  app_trace_gen_time_ms  leaf_proof_time_ms  inner_recursion_proof_time_ms  normal_instruction_ratio  openvm_precompile_ratio  powdr_ratio  powdr_rows
  /home/steve/powdr/keccak_dta_10000.json             4       2759833938           26522                 5531               5531                           5190                          602                   1242                   0                              0                  0.466358                 0.011792      0.52185      262144
/home/steve/powdr/keccak_dta_10000_1.json             4       2759833938           26522                 5498               5498                           5208                          597                   1200                   0                              0                  0.466358                 0.011792      0.52185      262144
/home/steve/powdr/keccak_dta_10000_2.json             4       2759833938           26522                 5570               5570                           5228                          619                   1211                   0                              0                  0.466358                 0.011792      0.52185      262144
/home/steve/powdr/keccak_dta_10000_3.json             4       2759833938           26522                 5579               5579                           5198                          600                   1235                   0                              0                  0.466358                 0.011792      0.52185      262144
/home/steve/powdr/keccak_dta_10000_4.json             4       2759833938           26522                 5549               5549                           5214                          598                   1215                   0                              0                  0.466358                 0.011792      0.52185      262144

Prior savings:
No skipping record: (1608 + 1689 + 1623 + 1653 + 1616 + 1647) / 6 = 1639.33
No skipping record + direct to APC for ALU: (1583 + 1561 + 1615 + 1579 + 1608 + 1546) / 6 = 1582
A savings of 3.5% and 57.33 ms

New savings:
Skipping record: (1359 + 1376 + 1358 + 1336 + 1349 + 1367) / 6 = 1357.5
Skip record + direct to APC for all Keccak chips (4 chips): (1242+1200+1211+1235+1215) / 5 = 1220.6
Roughly a savings of 10.1% and 136.9 ms

@Schaeff
Copy link
Collaborator

Schaeff commented Dec 22, 2025

The diff is showing a lot of unrelated things. Would be good to clean this up so it's clearer what the changes are here.

…2LoadStoreAdapterAir, LoadSignExtendCoreAir<4, 8>
…racegen from powdr gpu trace gen entirely; all 13 chips work for reth prove-app with 10 APC
@qwang98 qwang98 changed the title [WIP] Direct to APC trace gen Direct to APC trace gen Dec 24, 2025
@qwang98 qwang98 marked this pull request as ready for review December 25, 2025 01:41
@qwang98 qwang98 force-pushed the direct-to-apc branch 3 times, most recently from 3751367 to 7de4b0c Compare December 26, 2025 05:50
@qwang98
Copy link
Collaborator Author

qwang98 commented Dec 30, 2025

Another benchmarking pass after we minimized diffs in OVM. The following are all TOTAL tracegen numbers for 10K Keccak including both APC and non-APC, both record passing and tracegen per se:

  1. BEFORE minimizing diffs: (1242+1200+1211+1235+1215) / 5 = 1220.6
  2. AFTER minimizing diffs: (1159+1167+1195+1210+1192+1218)/6 = 1190.2
  3. BASELINE before any DTA: (1359 + 1376 + 1358 + 1336 + 1349 + 1367) / 6 = 1357.5

So we somehow got even faster although I was expecting the reverse (because we removed many manual if(row.is_apc) optimizations, but maybe this if else block run slows down the non-APC path?). Another theory is that new macros (for example skipping dummy row at the start of a thread) might have sped up trace gen. Regardless, this is another proof that skipping memory writes (which is done in our RowSlice infrastructure anyways) is taking much longer than regular GPU computations.

                                  filename  num_segments  app_proof_cells  app_proof_cols  total_proof_time_ms  app_proof_time_ms  app_execute_preflight_time_ms  app_execute_metered_time_ms  app_trace_gen_time_ms  leaf_proof_time_ms  inner_recursion_proof_time_ms  normal_instruction_ratio  openvm_precompile_ratio  powdr_ratio  powdr_rows
/home/steve/powdr/keccak_10000_auto_1.json             4       2759833938           26522                 5501               5501                           5284                          659                   1159                   0                              0                  0.466358                 0.011792      0.52185      262144
/home/steve/powdr/keccak_10000_auto_2.json             4       2759833938           26522                 5512               5512                           5474                          641                   1167                   0                              0                  0.466358                 0.011792      0.52185      262144
/home/steve/powdr/keccak_10000_auto_3.json             4       2759833938           26522                 5553               5553                           5277                          638                   1195                   0                              0                  0.466358                 0.011792      0.52185      262144
/home/steve/powdr/keccak_10000_auto_4.json             4       2759833938           26522                 5557               5557                           5318                          617                   1210                   0                              0                  0.466358                 0.011792      0.52185      262144
/home/steve/powdr/keccak_10000_auto_5.json             4       2759833938           26522                 5534               5534                           5352                          610                   1192                   0                              0                  0.466358                 0.011792      0.52185      262144
/home/steve/powdr/keccak_10000_auto_6.json             4       2759833938           26522                 5544               5544                           5304                          640                   1218                   0                              0                  0.466358                 0.011792      0.52185      262144

@leonardoalt
Copy link
Member

@qwang98 is this still going in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants