Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize PR test workflow #2580

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Optimize PR test workflow #2580

wants to merge 1 commit into from

Conversation

qwang98
Copy link
Collaborator

@qwang98 qwang98 commented Mar 24, 2025

This PR makes 16 GitHub Workflow jobs each with cargo nextest to run on 4 threads (max allowed for ubuntu for GitHub hosted servers), so that each thread can run maximum 1 of the 55 tests.

We are using two workflow runners:

  1. One is from provider "warp build", for jobs: build, bench, and run_examples. I assume that this is because these jobs requrie compiling to x86 architecture, which might not be supported by the GitHub Workflow runners? These runners seem to have associated cost with them, so I think we should be careful when optimizing the run_examples job, which is a current bottleneck of 38 minutes. It can be optimized by creating more runners to run the 5 examples in parallel, depending on the timing of each. (Need to calculate associated cost, not sure if it's charged by runner or by running minutes).
  2. The other is from GitHub Workflow runner, which seem to have 4 cores for the ubuntu instances that we are using. They are also free for public repos. I'm not sure how many concurrent runners are supported as it's unlimited and free in theory, so I'm allocating 16 concurrent runners for the slow tests. The test_quick job uses 1 runner.

We have 55 slow tests, with the following timing distribution:

PASS [ 21.329s] powdr-pipeline::powdr_std poseidon2_gl_test
PASS [ 25.665s] powdr-pipeline::powdr_std rotate_large_test
PASS [ 4.357s] powdr-pipeline::powdr_std split_bb_test
PASS [ 11.462s] powdr-pipeline::powdr_std split_gl_test
PASS [ 7.930s] powdr-pipeline::powdr_std rotate_small_test
PASS [ 510.098s] powdr-riscv::riscv halt
PASS [ 52.306s] powdr-riscv::riscv many_chunks_dry
PASS [ 637.928s] powdr-riscv::riscv keccak
PASS [ 332.218s] powdr-riscv::riscv vec_median
PASS [ 6.723s] powdr-pipeline::pil block_lookup_or_permutation
PASS [ 19.041s] powdr-pipeline::powdr_std arith256_small_test
PASS [ 471.428s] powdr-pipeline::powdr_std arith256_memory_large_test
PASS [ 3.878s] powdr-pipeline::powdr_std arith_small_test
PASS [ 88.260s] powdr-riscv::riscv output_syscall
PASS [ 564.755s] powdr-pipeline::powdr_std arith_large_test
PASS [ 552.897s] powdr-riscv::riscv std_hello_world
PASS [ 531.372s] powdr-riscv::riscv sum_serde
PASS [ 14.463s] powdr-pipeline::asm dynamic_vadcop
PASS [ 23.791s] powdr-pipeline::powdr_std memory_large_test
PASS [ 19.680s] powdr-pipeline::powdr_std poseidon_bn254_test
PASS [ 373.685s] powdr-riscv::instructions instruction_tests::srli
PASS [ 552.562s] powdr-riscv::riscv double_word
PASS [ 281.095s] powdr-riscv::riscv exported_csv_as_external_witness
PASS [ 255.865s] powdr-riscv::riscv many_chunks
PASS [ 123.384s] powdr-riscv::riscv read_slice
PASS [ 478.133s] powdr-riscv::riscv keccak_powdr
PASS [ 33.031s] powdr-pipeline::powdr_std binary_small_8_test
PASS [ 38.397s] powdr-pipeline::powdr_std poseidon_gl_memory_test
PASS [ 243.517s] powdr-riscv::instructions instruction_tests::bgeu
PASS [ 454.000s] powdr-pipeline::powdr_std keccakf16_test
PASS [ 255.832s] powdr-riscv::instructions instruction_tests::divu
PASS [ 488.605s] powdr-riscv::riscv runtime_ec_add
PASS [ 1.949s] powdr-pipeline::asm vm_to_vm_dynamic_trace_length
PASS [ 5.057s] powdr-pipeline::powdr_std add_sub_small_test
PASS [ 30.576s] powdr-pipeline::asm vm_args
PASS [ 3.776s] powdr-pipeline::powdr_std memory_large_test_parallel_accesses
PASS [ 4.262s] powdr-pipeline::powdr_std memory_large_with_bootloader_write_test
PASS [ 371.261s] powdr-riscv::riscv vec_median_estark_polygon
PASS [ 697.111s] powdr-pipeline::powdr_std keccakf16_memory_test
PASS [ 5.632s] powdr-pipeline::powdr_std memory_small_test
PASS [ 13.504s] powdr-pipeline::powdr_std poseidon2_bb_test
PASS [ 8.284s] powdr-pipeline::powdr_std poseidon_bb_test
PASS [ 7.283s] powdr-pipeline::powdr_std shift_small_test
PASS [ 1.958s] powdr-pipeline::powdr_std split_bn254_test
PASS [ 13.151s] powdr-pipeline::powdr_std reparse::keccakf16_test
PASS [ 368.083s] powdr-riscv::riscv inverse_gl
PASS [ 417.684s] powdr-riscv::riscv runtime_poseidon2_gl
PASS [ 90.563s] powdr-riscv::riscv sum_serde_in_mem
PASS [ 20.122s] powdr-pipeline::pil vm_to_block_dynamic_length
PASS [ 56.939s] powdr-pipeline::powdr_std binary_large_test
PASS [ 42.384s] powdr-pipeline::powdr_std binary_small_test
PASS [ 19.619s] powdr-pipeline::powdr_std shift_large_test
PASS [ 72.619s] powdr-pipeline::powdr_std split_gl_vec_test
PASS [ 246.394s] powdr-riscv::instructions instruction_tests::mulh
PASS [ 437.248s] powdr-pipeline::powdr_std keccakf32_memory_test

github-merge-queue bot pushed a commit that referenced this pull request Mar 26, 2025
This PR increases the number of `test_slow` PR test bins from 8 to 17,
thereby strictly reducing the time spent on `test_slow`.

We have three types of runners: 
- `warp-ubuntu-2404-x64-8x` for: build, run_examples, bench. I think
this is a paid server: https://www.warpbuild.com/, which @leonardoalt
last changed in #2187, so I have a question: why don't we use the GitHub
Workflow free ones? Besides, I think we can parallelize `run_examples`,
which is a bottleneck of 38 minutes currently run single threaded, so
there can be some immediate time improvement if we run it on multiple
runner instances. Is this server charged by the number of runners or the
number of minutes?
- `ubuntu-24.04` for: test_quick, test_estark_polygon, test_slow.
- `ubuntu-22.04` for: udeps.

**BEFORE Optimization**
<img width="346" alt="Screen Shot 2025-03-24 at 17 25 11"
src="https://github.com/user-attachments/assets/b197eca3-9994-4113-8f4d-1e7b106b064c"
/>

**AFTER Optimization**
See run time of this PR.

**Future Optimization**
In #2580, I'm working on creating 14 bins with 4 threads each because we
have 55 tests in total and each can run on a separate thread. This would
reduce the `test_slow` run time to that of the longest test, which is
~10 minutes.

This requires computing the bin-thread assignment because `nextest` only
supports the hash method for partitions, which isn't ideal in this case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant