Skip to content

CI: Add warmup requests before benchmark to avoid JIT compilation during timed run#302

Open
gyohuangxin wants to merge 2 commits intomainfrom
ci/add-benchmark-warmups
Open

CI: Add warmup requests before benchmark to avoid JIT compilation during timed run#302
gyohuangxin wants to merge 2 commits intomainfrom
ci/add-benchmark-warmups

Conversation

@gyohuangxin
Copy link
Member

Summary

  • Add --num-warmups=$CONC to benchmark_serving.py invocation in atom_test.sh
  • This sends warmup requests before the timed benchmark run, triggering JIT compilation of prefill-path kernels (e.g. mha_varlen_fwd) so they don't compile during the actual measurement
  • Without this, the first batch of benchmark requests triggers ~24s of aiter JIT compilation, skewing TTFT and throughput numbers

Test plan

  • Run a benchmark CI job and verify Warming up with N requests... appears before Starting main benchmark run...
  • Confirm no [aiter] start build messages appear after Starting main benchmark run...

Copilot AI review requested due to automatic review settings March 11, 2026 06:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the benchmark CI script to run warmup requests before the timed benchmark phase, aiming to avoid JIT compilation skewing TTFT/throughput measurements.

Changes:

  • Add --num-warmups=$CONC to the benchmark_serving.py invocation in the benchmark path of atom_test.sh.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 77 to 79
--trust-remote-code \
--num-warmups=$CONC \
--request-rate=inf --ignore-eos \
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--num-warmups=$CONC relies on a CLI option that may not exist in the cloned https://github.com/kimbochen/bench_serving.git (the script is cloned from HEAD, unpinned). If upstream changes or lacks this flag, the benchmark step will fail at runtime. Consider pinning bench_serving to a known commit/tag that supports --num-warmups (or switching to the in-repo atom/benchmarks/benchmark_serving.py), and/or adding a quick guard that checks the flag is supported before invoking the benchmark. Also note this adds CONC extra requests per run (up to 256 in the nightly matrix); you may want to cap warmups to a smaller fixed number if the goal is only to trigger JIT once.

Copilot uses AI. Check for mistakes.
@gyohuangxin gyohuangxin reopened this Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants