[AMD][MI35X]Update qwen3.5 perf by zhentaocc · Pull Request #1036 · SemiAnalysisAI/InferenceX

zhentaocc · 2026-04-16T02:03:35Z

update image, include pr changes:
update command args

* Added CONTEXT_LENGTH and MAX_PREFILL_TOKENS variables for better configuration. * Updated launch_server command with new options: --tokenizer-worker-num, --enable-aiter-allreduce-fusion, --cuda-graph-max-bs, --context-length, --disable-radix-cache, --max-prefill-tokens, and --scheduler-recv-interval.

… benchmark configurations for MI355X, enhancing performance with updated CLI arguments.

….yaml to v0.5.9, ensuring compatibility with recent changes.

… and BF16 SGLang benchmarks on MI355X, ensuring accurate tracking of performance enhancements.

… configurations and adjust perf-changelog.yaml to reflect the changes, ensuring accurate performance tracking and compatibility.

…ngelog.yaml to reflect improved CLI arguments for MI355X, ensuring better performance tracking.

…ter and adjusting memory fraction. Updated launch_server command to include data-parallel-size and improved context length handling for better performance.

…chmarks, increasing conc-end values and adding new entries for improved performance tuning on MI355X and MI300X.

…cripts for MI355X to streamline configuration and improve performance.

… EP_SIZE parameter for search-space entries, enhancing performance tuning for MI355X and MI300X. Adjusted perf-changelog.yaml to reflect updated image tag for better performance tracking.

…F16 and FP8 to improve performance tuning. Adjusted search-space configurations in amd-master.yaml to increase conc-end values for MI355X and MI300X.

…0rc0-rocm700 for MI355X and MI300X configurations, ensuring compatibility and improved performance tracking.

… MI355X to v0.5.10rc0-rocm700 and MI300X to v0.5.9-rocm720, ensuring compatibility and consistency across configurations.

…from 0.75 to 0.8 in Qwen3.5 BF16 and FP8 benchmark scripts to enhance performance tuning.

…rash

…chmarks, adjusting parameters to optimize performance for MI355X. Update perf-changelog.yaml to remove an outdated entry.

…3.5 benchmarks, replacing outdated sglang image references with the latest version to ensure consistency and improved performance.

…k script to simplify configuration and enhance performance tuning.

… the latest version (20260414) for improved consistency and performance.

… amd-master.yaml to streamline configuration.

… amd-master.yaml to enhance configuration clarity.

…md-master.yaml by adjusting parameters to improve performance tuning.

…l to the latest version (20260415) for improved consistency and performance.

github-actions · 2026-04-16T02:03:44Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

github-actions · 2026-04-16T02:03:44Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

…hangelog.yaml to the latest version (v0.5.10rc0-rocm720-mi35x-20260415) for enhanced performance.

…ipt for enhanced performance tuning.

…onfigurations to improve performance tuning.

…g.yaml to enhance configuration clarity and maintain consistency.

functionstackx · 2026-04-16T02:10:42Z

@claude can u add these links to changelog yaml

pdate image, include pr changes:
sgl-project/sglang#21188
sgl-project/sglang#21421
sgl-project/sglang#20736

Klaud-Cold · 2026-04-16T02:11:08Z

Claude finished @functionstackx's task in 50s —— View job

Adding SGLang PR links to perf-changelog.yaml

Read current perf-changelog.yaml and PR diff
Add SGLang PR links to changelog entries
Commit and push changes

Added the three upstream SGLang PR links to the existing qwen3.5-fp8-mi355x-sglang / qwen3.5-bf16-mi355x-sglang changelog entry:

[AMD] Add fused GemmaRMSNorm forward_hip to use aiter/vllm kernels for qwen3.5 sgl-project/sglang#21188
[AMD]Integrate aiter's fused_topk for softmax scoring in topk function sgl-project/sglang#21421
[AMD] Enable share expert fusion with router experts for Qwen3.5 BF16 & FP8 sgl-project/sglang#20736

Commit: 8c51a630
|

claude

Additional findings (outside current diff — PR may have been updated during review):

🟡 perf-changelog.yaml:1362-1369 — The new perf-changelog entry has three copy-paste errors: (1) qwen3.5-fp8-mi355x-sglang is listed twice in config-keys (positions 0 and 2) while qwen3.5-bf16-mi355x-sglang appears only once — the correct list should be [qwen3.5-bf16-mi355x-sglang, qwen3.5-fp8-mi355x-sglang]; (2) pr-link points to PR #980 instead of the current PR #1036; (3) the description says lmsysorg/sglang:v0.5.10-rocm720-mi35x but the actual image in amd-master.yaml is lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260415 (different repo, RC version, date suffix). Please fix all three before merging.
Extended reasoning...

Bug 1 — Duplicate config-key

The config-keys list in the new perf-changelog entry reads:
```
- qwen3.5-fp8-mi355x-sglang   # index 0
- qwen3.5-bf16-mi355x-sglang  # index 1
- qwen3.5-fp8-mi355x-sglang   # index 2  ← duplicate
```
The FP8 key is present at both index 0 and index 2; the BF16 key is present once. This is a copy-paste artifact — the entry was likely templated from the analogous MI300X/MI325X entry (PR #986) and the first or last line was not changed from FP8 to BF16. The correct list should be [qwen3.5-bf16-mi355x-sglang, qwen3.5-fp8-mi355x-sglang]. If any tooling processes config-keys to build a changelog database, verify coverage, or detect regressions, the duplicate will cause the FP8 config to be processed twice while the BF16 config is silently missing from the record.

Bug 2 — Wrong PR link

The entry sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/980. This PR is #1036; PR #980 is a completely different, unrelated PR that predates this one by over 50 numbers. Looking at every other entry in perf-changelog.yaml, each pr-link references the specific PR that introduced those changes — the link is the primary audit trail connecting a config change back to a code review. The analogous MI300X/MI325X entry (PR #986) was clearly used as a template and the link was not updated. The correct value is https://github.com/SemiAnalysisAI/InferenceX/pull/1036.

Bug 3 — Inaccurate image tag in description

The description bullet reads "Use lmsysorg/sglang:v0.5.10-rocm720-mi35x" but the image actually configured in amd-master.yaml for both qwen3.5-bf16-mi355x-sglang and qwen3.5-fp8-mi355x-sglang is lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260415. There are three discrepancies: (a) the Docker Hub repository is sglang-rocm not sglang; (b) the version is v0.5.10rc0 (a release candidate), not the stable v0.5.10; (c) the date suffix -20260415 is omitted. For comparison, the analogous MI300X/MI325X entry (PR #986) correctly documents lmsysorg/sglang:v0.5.10-rocm720-mi30x, which matches the actual image used there — the MI355X description appears to have been copied from that entry without updating for the different RC image. The PR description itself also contains this inaccuracy.

Step-by-step proof
1. Open perf-changelog.yaml, lines 1362–1369 (new entry added by this PR).
2. config-keys list: fp8, bf16, fp8 — FP8 duplicated, BF16 appears once. ✗
3. pr-link: .../pull/980 — current PR is #1036. ✗
4. Description image: lmsysorg/sglang:v0.5.10-rocm720-mi35x.
5. Open amd-master.yaml, qwen3.5-bf16-mi355x-sglang entry: image: lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260415. Mismatch on repo name, RC suffix, and date suffix. ✗
6. Same image is set for qwen3.5-fp8-mi355x-sglang in the same file. ✗
All three errors are documentation-only and have no impact on runtime behavior, but they create a misleading changelog entry that should be corrected before merge.

…X entry Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

seungrokj

lgtm

lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260414

chunfangamd

lgtm

This reverts commit 20073ba, except for changes to benchmarks/single_node/qwen3.5_{bf16,fp8}_mi355x.sh, which have been preserved to retain PR #1036's subsequent fixes.

[skip-sweep] This reverts commit 20073ba, except for changes to benchmarks/single_node/qwen3.5_{bf16,fp8}_mi355x.sh, which have been preserved to retain PR #1036's subsequent fixes.

Chen, Todd and others added 27 commits April 15, 2026 03:48

Update perf-changelog.yaml to include new Qwen3.5 FP8 and BF16 SGLang…

b22010e

… benchmark configurations for MI355X, enhancing performance with updated CLI arguments.

Update SGLang image versions for Qwen3.5 configurations in amd-master…

22d9500

….yaml to v0.5.9, ensuring compatibility with recent changes.

use 0327 build

6261169

Update perf-changelog.yaml to reflect the new PR link for Qwen3.5 FP8…

559daa3

… and BF16 SGLang benchmarks on MI355X, ensuring accurate tracking of performance enhancements.

Update Qwen3.5 image tags in amd-master.yaml to v0.5.10rc0 for MI355X…

8be5c3b

… configurations and adjust perf-changelog.yaml to reflect the changes, ensuring accurate performance tracking and compatibility.

Update Qwen3.5 FP8 and BF16 SGLang benchmark descriptions in perf-cha…

c4120a8

…ngelog.yaml to reflect improved CLI arguments for MI355X, ensuring better performance tracking.

Enhance Qwen3.5 benchmark scripts for MI355X by adding EP_SIZE parame…

5760086

…ter and adjusting memory fraction. Updated launch_server command to include data-parallel-size and improved context length handling for better performance.

Update search-space configurations in amd-master.yaml for Qwen3.5 ben…

df4c673

…chmarks, increasing conc-end values and adding new entries for improved performance tuning on MI355X and MI300X.

Remove context length parameter from Qwen3.5 BF16 and FP8 benchmark s…

4367ae0

…cripts for MI355X to streamline configuration and improve performance.

update to 5.10 rocm for qwen35

37406ec

Update Qwen3.5 benchmark configurations in amd-master.yaml to include…

f5279e5

… EP_SIZE parameter for search-space entries, enhancing performance tuning for MI355X and MI300X. Adjusted perf-changelog.yaml to reflect updated image tag for better performance tracking.

Update context length calculations in Qwen3.5 benchmark scripts for B…

da6b5ac

…F16 and FP8 to improve performance tuning. Adjusted search-space configurations in amd-master.yaml to increase conc-end values for MI355X and MI300X.

Update image tags in amd-master.yaml for Qwen3.5 benchmarks to v0.5.1…

b318fee

…0rc0-rocm700 for MI355X and MI300X configurations, ensuring compatibility and improved performance tracking.

Update image tags in amd-master.yaml for Qwen3.5 benchmarks, changing…

54b94f1

… MI355X to v0.5.10rc0-rocm700 and MI300X to v0.5.9-rocm720, ensuring compatibility and consistency across configurations.

Remove data-parallel-size parameter and increase mem-fraction-static …

0e0d51f

…from 0.75 to 0.8 in Qwen3.5 BF16 and FP8 benchmark scripts to enhance performance tuning.

Update sglang image for qwen3.5 mi355x configs to fix shared memory c…

44cccfc

…rash

Refine search-space configurations in amd-master.yaml for Qwen3.5 ben…

ce131cc

…chmarks, adjusting parameters to optimize performance for MI355X. Update perf-changelog.yaml to remove an outdated entry.

Update image tags in amd-master.yaml and perf-changelog.yaml for Qwen…

f9f7f29

…3.5 benchmarks, replacing outdated sglang image references with the latest version to ensure consistency and improved performance.

Remove aiter allreduce fusion option from Qwen3.5 FP8 MI355X benchmar…

0fc9c12

…k script to simplify configuration and enhance performance tuning.

optimize the search space

fe7672b

Upgrade image to 20260413

7776a07

Update sglang image tags for Qwen3.5 benchmarks in amd-master.yaml to…

99b69b3

… the latest version (20260414) for improved consistency and performance.

Remove redundant search-space entry for qwen3.5-bf16-mi355x-sglang in…

0a35926

… amd-master.yaml to streamline configuration.

Remove duplicate search-space entry for qwen3.5-bf16-mi355x-sglang in…

f7a59b6

… amd-master.yaml to enhance configuration clarity.

Refine search-space configurations for qwen3.5-fp8-mi355x-sglang in a…

b1bd701

…md-master.yaml by adjusting parameters to improve performance tuning.

Update sglang image tags for qwen3.5 configurations in amd-master.yam…

2915be4

…l to the latest version (20260415) for improved consistency and performance.

zhentaocc requested a review from a team April 16, 2026 02:03

zhentaocc requested review from billishyahao and chunfangamd as code owners April 16, 2026 02:03

github-project-automation bot added this to InferenceMAX Board Apr 16, 2026

zhentaocc assigned zhentaocc and chunfangamd Apr 16, 2026

Chen, Todd added 4 commits April 15, 2026 21:05

Update sglang image tag for Qwen3.5 FP8 and BF16 benchmarks in perf-c…

81c8886

…hangelog.yaml to the latest version (v0.5.10rc0-rocm720-mi35x-20260415) for enhanced performance.

Add aiter allreduce fusion option to Qwen3.5 FP8 MI355X benchmark scr…

1f12ab3

…ipt for enhanced performance tuning.

Adjust CONTEXT_LENGTH in Qwen3.5 benchmark scripts for BF16 and FP8 c…

319ed68

…onfigurations to improve performance tuning.

Remove duplicate entry for qwen3.5-fp8-mi355x-sglang in perf-changelo…

ee837ec

…g.yaml to enhance configuration clarity and maintain consistency.

zhentaocc added AMD sweep-enabled labels Apr 16, 2026

claude bot reviewed Apr 16, 2026

View reviewed changes

Add upstream SGLang PR links to perf-changelog.yaml for qwen3.5 MI355…

8c51a63

…X entry Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

seungrokj approved these changes Apr 16, 2026

View reviewed changes

seungrokj mentioned this pull request Apr 16, 2026

[AMD/ROCm] qwen3.5 fp8 mi355x SGL performance update #995

Closed

Downgrade the image for Qwen3.5-FP8-MI355X-SGLang to 20260414

f0a8e2d

lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260414

chunfangamd approved these changes Apr 16, 2026

View reviewed changes

chunfangamd enabled auto-merge (squash) April 16, 2026 12:42

cquil11 approved these changes Apr 16, 2026

View reviewed changes

chunfangamd merged commit 86ddaed into main Apr 16, 2026
50 checks passed

chunfangamd deleted the todd/qwen35-mi35x branch April 16, 2026 13:47

github-project-automation bot moved this to Done in InferenceMAX Board Apr 16, 2026

seungrokj mentioned this pull request Apr 17, 2026

[Bug] ROCm MI355 Qwen 3.5 397B FP8 Perf Worse than BF16 sgl-project/sglang#19633

Open

5 tasks

Klaud-Cold mentioned this pull request Apr 17, 2026

Add B300 config: qwen3.5-fp8-sglang-mtp #1035

Merged

cquil11 mentioned this pull request Apr 17, 2026

Revert "[AMD][MI30X]Update Qwen3.5 perf (#986)" [skip-sweep] #1062

Merged

cquil11 mentioned this pull request Apr 17, 2026

[AMD][MI30X]Update Qwen3.5 perf #1063

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD][MI35X]Update qwen3.5 perf#1036

[AMD][MI35X]Update qwen3.5 perf#1036
chunfangamd merged 33 commits intomainfrom
todd/qwen35-mi35x

zhentaocc commented Apr 16, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

functionstackx commented Apr 16, 2026

Uh oh!

Klaud-Cold commented Apr 16, 2026 •

edited

Loading

Uh oh!

claude bot left a comment

Uh oh!

seungrokj left a comment

Uh oh!

chunfangamd left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

zhentaocc commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

functionstackx commented Apr 16, 2026

Uh oh!

Klaud-Cold commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adding SGLang PR links to perf-changelog.yaml

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Uh oh!

seungrokj left a comment

Choose a reason for hiding this comment

Uh oh!

chunfangamd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zhentaocc commented Apr 16, 2026 •

edited

Loading

Klaud-Cold commented Apr 16, 2026 •

edited

Loading