CI: Add warmup requests before benchmark to avoid JIT compilation during timed run#302
CI: Add warmup requests before benchmark to avoid JIT compilation during timed run#302gyohuangxin wants to merge 2 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the benchmark CI script to run warmup requests before the timed benchmark phase, aiming to avoid JIT compilation skewing TTFT/throughput measurements.
Changes:
- Add
--num-warmups=$CONCto thebenchmark_serving.pyinvocation in the benchmark path ofatom_test.sh.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| --trust-remote-code \ | ||
| --num-warmups=$CONC \ | ||
| --request-rate=inf --ignore-eos \ |
There was a problem hiding this comment.
--num-warmups=$CONC relies on a CLI option that may not exist in the cloned https://github.com/kimbochen/bench_serving.git (the script is cloned from HEAD, unpinned). If upstream changes or lacks this flag, the benchmark step will fail at runtime. Consider pinning bench_serving to a known commit/tag that supports --num-warmups (or switching to the in-repo atom/benchmarks/benchmark_serving.py), and/or adding a quick guard that checks the flag is supported before invoking the benchmark. Also note this adds CONC extra requests per run (up to 256 in the nightly matrix); you may want to cap warmups to a smaller fixed number if the goal is only to trigger JIT once.
Summary
--num-warmups=$CONCtobenchmark_serving.pyinvocation inatom_test.shmha_varlen_fwd) so they don't compile during the actual measurementTest plan
Warming up with N requests...appears beforeStarting main benchmark run...[aiter] start buildmessages appear afterStarting main benchmark run...