Skip to content

Commit 6c763c1

Browse files
kashifulhaqueharvenstarRuixiangMaalisonshaomickqian
authored
Utils refactor (#1)
* fix(ci): recover from corrupted MMMU parquet cache (sgl-project#17256) * [diffusion] feat: support default 4-step inference for Flux2-Klein distilled models (sgl-project#17225) Signed-off-by: Lancer <maruixiang6688@gmail.com> * Add runner utilization report workflow (sgl-project#17234) * cli: support sglang version (sgl-project#17250) * Use swa radix cache and memory pool for gpt-oss model (sgl-project#17261) * [VLM][Reland] Refactor load_mm_data to improve performance (sgl-project#16152) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> * [Tiny] Improve docs (sgl-project#17264) * [diffusion] fix: set guidance_scale default to None (sgl-project#17182) * Tiny fix comment typo (sgl-project#17287) * [SPEC_V2] Enable cudagraph draft_extend for trtllm_mla_backend and Acclen Fix for DP under cudagraph mode (sgl-project#16974) * Add kl test for swa radix cache (sgl-project#17281) * fix: Handle multiple named chat templates in HuggingFace tokenizers (sgl-project#17236) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> * Move radix cache related tests (sgl-project#17295) * [Refactor] Add `-fp4-gemm-backend` to replace `SGLANG_FLASHINFER_FP4_GEMM_BACKEND` (sgl-project#16534) Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com> * [Bugfix] Fix PD accuracy when MTP is not configured on the prefill node (sgl-project#17212) Co-authored-by: Shangming Cai <csmthu@gmail.com> * [Diffusion] Apply jit qk_norm to flux1 (sgl-project#17296) * [Refactor] Split out deepseek v2 weight loader function into mixin (sgl-project#16649) * [NPU]Support GPT-OSS for NPU (sgl-project#14197) * [jit-kernel] Add CuTe DSL GDN Decode Kernel (sgl-project#15631) Co-authored-by: Jinyan Chen <jinyanc@nvidia.com> * [GLM 4.7] Add RTX 6000 Pro aka sm120 (sgl-project#17235) Co-authored-by: root <root@ubuntu-nvidia.localdomain> * Update CODEOWNERS for multimodal_gen (sgl-project#17308) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> * [Feature] overlap LoRA weight loading with compute (sgl-project#15512) * [PD] Optimize MHA models pp util calculation logic (sgl-project#17306) * [Minor] Correct sglang version when installing from source (sgl-project#17315) * Use dsv3 optimized routing `fused_topk_deepseek` instead of `moe_fused_gate` (sgl-project#15347) * [DeepSeek v3.2] Opt MTP decode cuda batch sizes and nsa implementation (sgl-project#16961) * Update code sync scripts (sgl-project#17319) * [Auto Sync] Update tokenizer_manager.py (20260119) (sgl-project#17317) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * support new qwen3_coder_detector (sgl-project#16744) Co-authored-by: liugaoji.lgj <liugaoji.lgj@alibaba-inc.com> * Fix kernel selection in biased_grouped_topk_gpu (sgl-project#17325) * KV Cache Events with Attention DP bug fix (sgl-project#16030) (sgl-project#16412) * [Perf] fuse q, k norm for Flux2Attention (sgl-project#17241) Co-authored-by: Minglei Zhu <zminglei@linkedin.com> * [CI] Add partition to stage-b-test-large-1-gpu (11->12) (sgl-project#17245) * fix(ci): rate limit and permission errors in trace publishing (sgl-project#17238) * Revert "[Perf] fuse q, k norm for Flux2Attention (sgl-project#17241)" (sgl-project#17332) * Migrate performance, accuracy, and quantization tests to CI registry (sgl-project#17177) Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com> * Inclusion of nvfp4 blockscale in EPLB Rebalance (sgl-project#17158) * [Refactor] Set `fp4-gemm-backend=auto` on SM100 and rename `fp4-gemm-backend` with `flashinfer_` prefix (sgl-project#17309) * [Diffusion] Apply qknorm to flux2 and apply lightx2v rms_norm_one_pass kernel(without residual) (sgl-project#17305) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Fix v32 continue_final_message not work (sgl-project#16567) * Evict swa kv cache during decoding (sgl-project#17220) * [RadixTree][1/N Refactor]: Support unified match_prefix params (sgl-project#17142) Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> * [AMD CI] Migrate and Add More Testcases (sgl-project#17116) Co-authored-by: yctseng0211 <yctseng@amd.com> * [AMD] CI - add partitions for stage-b-test-small-1-gpu-amd (sgl-project#17345) * Restore deepseek_v2.py to main's code, except the utils * Ran `pre-commit` --------- Signed-off-by: Lancer <maruixiang6688@gmail.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Hudson Xing <1277646412@qq.com> Co-authored-by: Lancer <402430575@qq.com> Co-authored-by: Alison Shao <54658187+alisonshao@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Ke Bao <ispobaoke@gmail.com> Co-authored-by: Yuan Luo <yuan.luo@hotmail.com> Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu> Co-authored-by: Changyi Yang <112288487+ChangyiYang@users.noreply.github.com> Co-authored-by: YAMY <74099316+YAMY1234@users.noreply.github.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: b8zhong <b8zhong@uwaterloo.ca> Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com> Co-authored-by: Ch3ngY1 <91232537+Ch3ngY1@users.noreply.github.com> Co-authored-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Jerry Ji <jerryjilol@gmail.com> Co-authored-by: Todobe <43903496+Todobe@users.noreply.github.com> Co-authored-by: Jinyan Chen <93358689+liz-badada@users.noreply.github.com> Co-authored-by: Jinyan Chen <jinyanc@nvidia.com> Co-authored-by: Koushik Dutta <koush@koushikdutta.com> Co-authored-by: root <root@ubuntu-nvidia.localdomain> Co-authored-by: Glen Liu <62917497+glenliu21@users.noreply.github.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> Co-authored-by: Lee Nau <lnau@nvidia.com> Co-authored-by: Yongfei Xu <xuyongfei.xyf@antgroup.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Gaoji Liu <34803073+attack204@users.noreply.github.com> Co-authored-by: liugaoji.lgj <liugaoji.lgj@alibaba-inc.com> Co-authored-by: yudian0504 <138860534+yudian0504@users.noreply.github.com> Co-authored-by: Kartik Ramesh <kartikx2000@gmail.com> Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com> Co-authored-by: Minglei Zhu <zminglei@linkedin.com> Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com> Co-authored-by: Shu Wang <shuw@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ybyang <10629930+whybeyoung@users.noreply.github.com> Co-authored-by: zhangheng <hzh0425@apache.org> Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: Bingxu Chen <Bingxu.Chen@amd.com> Co-authored-by: yctseng0211 <yctseng@amd.com>
1 parent 294d6ff commit 6c763c1

132 files changed

Lines changed: 7457 additions & 2791 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/CODEOWNERS

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44
/python/pyproject.toml @merrymercy @Fridge003 @ispobock
55
/python/sglang/jit_kernel @DarkSharpness @BBuf
66
/python/sglang/multimodal_gen @mickqian @yhyang201
7+
/python/sglang/multimodal_gen/runtime/layers @mickqian @yhyang201 @BBuf
8+
/python/sglang/multimodal_gen/runtime/models/dits @mickqian @yhyang201 @BBuf
79
/python/sglang/srt/batch_invariant_ops @Fridge003 @hebiao064
810
/python/sglang/srt/constrained @hnyls2002 @DarkSharpness
911
/python/sglang/srt/compilation @hebiao064

.github/workflows/open-pr-copy-from-oss.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,6 @@ jobs:
2323
2424
- name: Copy from OSS code
2525
env:
26-
GH_TOKEN: ${{ secrets.PAT_FOR_CODE_SYNC_FROM_LIANMIN }}
26+
GH_TOKEN: ${{ secrets.GH_PAT_FOR_OPEN_PR_TO_PRIVATE }}
2727
run: |
2828
python3 scripts/code_sync/copy_from_oss.py

.github/workflows/open-pr-copy-to-oss.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,6 @@ jobs:
2626
2727
- name: Copy to OSS code
2828
env:
29-
GH_TOKEN: ${{ secrets.PAT_FOR_CODE_SYNC_FROM_LIANMIN }}
29+
GH_TOKEN: ${{ secrets.GH_PAT_FOR_OPEN_PR_TO_OSS }}
3030
run: |
3131
python3 scripts/code_sync/copy_to_oss.py --commit ${{ github.event.inputs.commit_sha }}

.github/workflows/pr-test-amd.yml

Lines changed: 99 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,10 @@ jobs:
149149
docker exec -w /sglang-checkout/sgl-kernel/tests ci_sglang python3 -m pytest test_topk.py
150150
docker exec -w /sglang-checkout/sgl-kernel/tests ci_sglang python3 -m pytest test_kvcacheio.py
151151
docker exec -w /sglang-checkout/sgl-kernel/tests/sgl_diffusion ci_sglang python3 -m pytest test_timestep_embedding.py
152-
152+
docker exec -w /sglang-checkout/sgl-kernel/tests ci_sglang python3 -m pytest test_moe_topk_sigmoid.py
153+
docker exec -w /sglang-checkout/sgl-kernel/tests ci_sglang python3 -m pytest test_torch_defaults_reset.py
154+
docker exec -w /sglang-checkout/sgl-kernel/tests ci_sglang python3 -m pytest test_amd_deterministic_custom_allreduce.py
155+
docker exec -w /sglang-checkout/sgl-kernel/tests ci_sglang python3 -m pytest test_amd_nccl_allreduce_determinism.py
153156
# =============================================== primary ====================================================
154157

155158
stage-a-test-1-amd:
@@ -190,7 +193,7 @@ jobs:
190193
- name: Run test
191194
timeout-minutes: 10
192195
run: |
193-
bash scripts/ci/amd_ci_exec.sh -w "/sglang-checkout/test" python3 run_suite.py --hw amd --suite stage-a-test-1
196+
bash scripts/ci/amd_ci_exec.sh -w "/sglang-checkout/test" python3 run_suite.py --hw amd --suite stage-a-test-1-amd
194197
195198
stage-b-test-small-1-gpu-amd:
196199
needs: [check-changes, stage-a-test-1-amd]
@@ -208,7 +211,7 @@ jobs:
208211
fail-fast: false
209212
matrix:
210213
runner: [linux-mi325-gpu-1]
211-
part: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
214+
part: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
212215
runs-on: ${{matrix.runner}}
213216
steps:
214217
- name: Checkout code
@@ -230,7 +233,7 @@ jobs:
230233
- name: Run test
231234
timeout-minutes: 30
232235
run: |
233-
bash scripts/ci/amd_ci_exec.sh -w "/sglang-checkout/test" python3 run_suite.py --hw amd --suite stage-b-test-small-1-gpu-amd --auto-partition-id ${{ matrix.part }} --auto-partition-size 12 --timeout-per-file 1800
236+
bash scripts/ci/amd_ci_exec.sh -w "/sglang-checkout/test" python3 run_suite.py --hw amd --suite stage-b-test-small-1-gpu-amd --auto-partition-id ${{ matrix.part }} --auto-partition-size 13 --timeout-per-file 1800
234237
235238
stage-b-test-small-1-gpu-amd-mi35x:
236239
needs: [check-changes, stage-a-test-1-amd]
@@ -548,52 +551,13 @@ jobs:
548551
echo "=== Post-test System Memory Status ==="
549552
free -h
550553
551-
unit-test-backend-1-gpu-amd:
552-
needs: [check-changes, stage-a-test-1-amd]
553-
if: |
554-
always() &&
555-
(
556-
(inputs.target_stage == 'unit-test-backend-1-gpu-amd') ||
557-
(
558-
!inputs.target_stage &&
559-
(!failure() && !cancelled()) &&
560-
((needs.check-changes.outputs.main_package == 'true') || (needs.check-changes.outputs.sgl_kernel == 'true'))
561-
)
562-
)
563-
strategy:
564-
fail-fast: false
565-
matrix:
566-
runner: [linux-mi325-gpu-1]
567-
part: [0, 1]
568-
runs-on: ${{matrix.runner}}
569-
steps:
570-
- name: Checkout code
571-
uses: actions/checkout@v4
572-
with:
573-
ref: ${{ inputs.pr_head_sha || inputs.ref || github.sha }}
574-
575-
- name: Ensure VRAM is clear
576-
run: bash scripts/ensure_vram_clear.sh rocm
577-
578-
- name: Start CI container
579-
run: bash scripts/ci/amd_ci_start_container.sh
580-
env:
581-
GITHUB_WORKSPACE: ${{ github.workspace }}
582-
583-
- name: Install dependencies
584-
run: bash scripts/ci/amd_ci_install_dependency.sh
585554
586-
- name: Run test
587-
timeout-minutes: 30
588-
run: |
589-
bash scripts/ci/amd_ci_exec.sh python3 run_suite.py --suite per-commit-amd --auto-partition-id ${{ matrix.part }} --auto-partition-size 2
590-
591-
unit-test-backend-8-gpu-amd:
592-
needs: [check-changes, stage-a-test-1-amd]
555+
stage-c-test-large-8-gpu-amd:
556+
needs: [check-changes, call-gate, stage-b-test-small-1-gpu-amd, stage-b-test-large-2-gpu-amd]
593557
if: |
594558
always() &&
595559
(
596-
(inputs.target_stage == 'unit-test-backend-8-gpu-amd') ||
560+
(inputs.target_stage == 'stage-c-test-large-8-gpu-amd') ||
597561
(
598562
!inputs.target_stage &&
599563
(!failure() && !cancelled()) &&
@@ -634,7 +598,7 @@ jobs:
634598
- name: Run test
635599
timeout-minutes: 60
636600
run: |
637-
bash scripts/ci/amd_ci_exec.sh python3 run_suite.py --suite per-commit-8-gpu-amd --auto-partition-id ${{ matrix.part }} --auto-partition-size 2 --timeout-per-file 3600
601+
bash scripts/ci/amd_ci_exec.sh -w "/sglang-checkout/test" python3 run_suite.py --hw amd --suite stage-c-test-large-8-gpu-amd --auto-partition-id ${{ matrix.part }} --auto-partition-size 2 --timeout-per-file 3600
638602
639603
stage-c-test-large-8-gpu-amd-mi35x:
640604
needs: [check-changes, call-gate, stage-b-test-small-1-gpu-amd, stage-b-test-large-2-gpu-amd]
@@ -713,23 +677,29 @@ jobs:
713677
- name: Benchmark single latency
714678
timeout-minutes: 20
715679
run: |
716-
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_one_batch.TestBenchOneBatch.test_bs1_small
717-
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_one_batch.TestBenchOneBatch.test_bs1_default
680+
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_one_batch_1gpu.TestBenchOneBatch1GPU.test_bs1_small
681+
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_one_batch_1gpu.TestBenchOneBatch1GPU.test_bs1_default
718682
719683
- name: Benchmark online latency
720684
timeout-minutes: 15
721685
run: |
722-
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_online_latency_default
686+
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_1gpu_part1.TestBenchServing1GPUPart1.test_online_latency_default
687+
688+
- name: Benchmark online latency (LoRA)
689+
timeout-minutes: 10
690+
run: |
691+
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_lora_online_latency
692+
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_lora_online_latency_with_concurrent_adapter_updates
723693
724694
- name: Benchmark offline throughput
725695
timeout-minutes: 15
726696
run: |
727-
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_default
697+
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_1gpu_part1.TestBenchServing1GPUPart1.test_offline_throughput_default
728698
729699
- name: Benchmark offline throughput (Non-streaming, small batch size)
730700
timeout-minutes: 15
731701
run: |
732-
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_non_stream_small_batch_size
702+
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_1gpu_part1.TestBenchServing1GPUPart1.test_offline_throughput_non_stream_small_batch_size
733703
734704
performance-test-1-gpu-part-2-amd:
735705
needs: [check-changes, stage-a-test-1-amd]
@@ -768,17 +738,81 @@ jobs:
768738
- name: Benchmark offline throughput (w/o RadixAttention)
769739
timeout-minutes: 15
770740
run: |
771-
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_without_radix_cache
741+
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_1gpu_part1.TestBenchServing1GPUPart1.test_offline_throughput_without_radix_cache
772742
773743
- name: Benchmark offline throughput (w/ Triton)
774744
timeout-minutes: 15
775745
run: |
776-
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_with_triton_attention_backend
746+
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_1gpu_part1.TestBenchServing1GPUPart1.test_offline_throughput_with_triton_attention_backend
777747
778748
- name: Benchmark offline throughput (w/ FP8)
779749
timeout-minutes: 15
780750
run: |
781-
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_default_fp8
751+
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_1gpu_large.TestBenchServing1GPULarge.test_offline_throughput_default_fp8
752+
753+
- name: Benchmark VLM offline throughput
754+
timeout-minutes: 10
755+
run: |
756+
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_vlm_offline_throughput
757+
758+
- name: Benchmark VLM online latency
759+
timeout-minutes: 10
760+
run: |
761+
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_vlm_online_latency
762+
763+
performance-test-1-gpu-part-3-amd:
764+
needs: [check-changes, stage-a-test-1-amd]
765+
if: |
766+
always() &&
767+
(
768+
(inputs.target_stage == 'performance-test-1-gpu-part-3-amd') ||
769+
(
770+
!inputs.target_stage &&
771+
(!failure() && !cancelled()) &&
772+
((needs.check-changes.outputs.main_package == 'true') || (needs.check-changes.outputs.sgl_kernel == 'true'))
773+
)
774+
)
775+
strategy:
776+
fail-fast: false
777+
matrix:
778+
runner: [linux-mi325-gpu-1]
779+
runs-on: ${{matrix.runner}}
780+
steps:
781+
- name: Checkout code
782+
uses: actions/checkout@v4
783+
with:
784+
ref: ${{ inputs.pr_head_sha || inputs.ref || github.sha }}
785+
786+
- name: Ensure VRAM is clear
787+
run: bash scripts/ensure_vram_clear.sh rocm
788+
789+
- name: Start CI container
790+
run: bash scripts/ci/amd_ci_start_container.sh
791+
env:
792+
GITHUB_WORKSPACE: ${{ github.workspace }}
793+
794+
- name: Install dependencies
795+
run: bash scripts/ci/amd_ci_install_dependency.sh
796+
797+
- name: Benchmark Scores online latency and throughput
798+
timeout-minutes: 10
799+
run: |
800+
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_score_api_latency_throughput
801+
802+
- name: Benchmark Scores online latency and throughput (batch size scaling)
803+
timeout-minutes: 10
804+
run: |
805+
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_score_api_batch_scaling
806+
807+
- name: Benchmark Embeddings online latency and throughput
808+
timeout-minutes: 10
809+
run: |
810+
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_embeddings_api_latency_throughput
811+
812+
- name: Benchmark Embeddings online latency and throughput (batch size scaling)
813+
timeout-minutes: 10
814+
run: |
815+
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_embeddings_api_batch_scaling
782816
783817
performance-test-2-gpu-amd:
784818
needs: [check-changes, stage-a-test-1-amd]
@@ -822,32 +856,32 @@ jobs:
822856
- name: Benchmark single latency (TP=2)
823857
timeout-minutes: 25
824858
run: |
825-
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_one_batch.TestBenchOneBatch.test_moe_tp2_bs1
859+
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_one_batch_2gpu.TestBenchOneBatch2GPU.test_moe_tp2_bs1
826860
827861
- name: Benchmark single latency + torch.compile (TP=2)
828862
timeout-minutes: 25
829863
run: |
830-
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_one_batch.TestBenchOneBatch.test_torch_compile_tp2_bs1
864+
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_one_batch_2gpu.TestBenchOneBatch2GPU.test_torch_compile_tp2_bs1
831865
832866
- name: Benchmark offline throughput (TP=2)
833867
timeout-minutes: 25
834868
run: |
835-
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_moe_offline_throughput_default
869+
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_2gpu.TestBenchServing2GPU.test_moe_offline_throughput_default
836870
837871
- name: Benchmark offline throughput (w/o RadixAttention) (TP=2)
838872
timeout-minutes: 25
839873
run: |
840-
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_moe_offline_throughput_without_radix_cache
874+
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_2gpu.TestBenchServing2GPU.test_moe_offline_throughput_without_radix_cache
841875
842876
- name: Benchmark offline PP decode throughput (PP=2)
843877
timeout-minutes: 10
844878
run: |
845-
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_pp_offline_throughput_default_decode
879+
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_2gpu.TestBenchServing2GPU.test_pp_offline_throughput_default_decode
846880
847881
- name: Benchmark offline PP prefill throughput (PP=2)
848882
timeout-minutes: 10
849883
run: |
850-
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_pp_long_context_prefill
884+
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_2gpu.TestBenchServing2GPU.test_pp_long_context_prefill
851885
852886
accuracy-test-1-gpu-amd:
853887
needs: [check-changes, stage-a-test-1-amd]
@@ -886,7 +920,7 @@ jobs:
886920
- name: Evaluate Accuracy
887921
timeout-minutes: 30
888922
run: |
889-
bash scripts/ci/amd_ci_exec.sh -e SGLANG_USE_AITER=0 python3 test_eval_accuracy_large.py
923+
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/eval -e SGLANG_USE_AITER=0 python3 test_eval_accuracy_large.py
890924
891925
accuracy-test-2-gpu-amd:
892926
needs: [check-changes, accuracy-test-1-gpu-amd]
@@ -926,7 +960,7 @@ jobs:
926960
- name: Evaluate accuracy (TP=2)
927961
timeout-minutes: 30
928962
run: |
929-
bash scripts/ci/amd_ci_exec.sh -e SGLANG_USE_AITER_AR=0 -e SGLANG_USE_AITER=0 -e HF_HUB_ENABLE_HF_TRANSFER=0 python3 test_moe_eval_accuracy_large.py
963+
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/eval -e SGLANG_USE_AITER_AR=0 -e SGLANG_USE_AITER=0 -e HF_HUB_ENABLE_HF_TRANSFER=0 python3 test_moe_eval_accuracy_large.py
930964
931965
pr-test-amd-finish:
932966
needs:
@@ -942,11 +976,11 @@ jobs:
942976
stage-b-test-small-1-gpu-amd,
943977
stage-b-test-small-1-gpu-amd-mi35x,
944978
stage-b-test-large-2-gpu-amd,
945-
unit-test-backend-1-gpu-amd,
946-
unit-test-backend-8-gpu-amd,
979+
stage-c-test-large-8-gpu-amd,
947980
stage-c-test-large-8-gpu-amd-mi35x,
948981
performance-test-1-gpu-part-1-amd,
949982
performance-test-1-gpu-part-2-amd,
983+
performance-test-1-gpu-part-3-amd,
950984
performance-test-2-gpu-amd,
951985
accuracy-test-1-gpu-amd,
952986
accuracy-test-2-gpu-amd,

0 commit comments

Comments
 (0)