Skip to content

[AMD][MI35X]Update qwen3.5 perf#1036

Merged
chunfangamd merged 33 commits intomainfrom
todd/qwen35-mi35x
Apr 16, 2026
Merged

[AMD][MI35X]Update qwen3.5 perf#1036
chunfangamd merged 33 commits intomainfrom
todd/qwen35-mi35x

Conversation

@zhentaocc
Copy link
Copy Markdown
Collaborator

@zhentaocc zhentaocc commented Apr 16, 2026

Chen, Todd and others added 27 commits April 15, 2026 03:48
* Added CONTEXT_LENGTH and MAX_PREFILL_TOKENS variables for better configuration.
* Updated launch_server command with new options: --tokenizer-worker-num, --enable-aiter-allreduce-fusion, --cuda-graph-max-bs, --context-length, --disable-radix-cache, --max-prefill-tokens, and --scheduler-recv-interval.
… benchmark configurations for MI355X, enhancing performance with updated CLI arguments.
….yaml to v0.5.9, ensuring compatibility with recent changes.
… and BF16 SGLang benchmarks on MI355X, ensuring accurate tracking of performance enhancements.
… configurations and adjust perf-changelog.yaml to reflect the changes, ensuring accurate performance tracking and compatibility.
…ngelog.yaml to reflect improved CLI arguments for MI355X, ensuring better performance tracking.
…ter and adjusting memory fraction. Updated launch_server command to include data-parallel-size and improved context length handling for better performance.
…chmarks, increasing conc-end values and adding new entries for improved performance tuning on MI355X and MI300X.
…cripts for MI355X to streamline configuration and improve performance.
… EP_SIZE parameter for search-space entries, enhancing performance tuning for MI355X and MI300X. Adjusted perf-changelog.yaml to reflect updated image tag for better performance tracking.
…F16 and FP8 to improve performance tuning. Adjusted search-space configurations in amd-master.yaml to increase conc-end values for MI355X and MI300X.
…0rc0-rocm700 for MI355X and MI300X configurations, ensuring compatibility and improved performance tracking.
… MI355X to v0.5.10rc0-rocm700 and MI300X to v0.5.9-rocm720, ensuring compatibility and consistency across configurations.
…from 0.75 to 0.8 in Qwen3.5 BF16 and FP8 benchmark scripts to enhance performance tuning.
…chmarks, adjusting parameters to optimize performance for MI355X. Update perf-changelog.yaml to remove an outdated entry.
…3.5 benchmarks, replacing outdated sglang image references with the latest version to ensure consistency and improved performance.
…k script to simplify configuration and enhance performance tuning.
… the latest version (20260414) for improved consistency and performance.
… amd-master.yaml to streamline configuration.
… amd-master.yaml to enhance configuration clarity.
…md-master.yaml by adjusting parameters to improve performance tuning.
…l to the latest version (20260415) for improved consistency and performance.
@zhentaocc zhentaocc requested a review from a team April 16, 2026 02:03
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

Chen, Todd added 4 commits April 15, 2026 21:05
…hangelog.yaml to the latest version (v0.5.10rc0-rocm720-mi35x-20260415) for enhanced performance.
…onfigurations to improve performance tuning.
…g.yaml to enhance configuration clarity and maintain consistency.
@functionstackx
Copy link
Copy Markdown
Contributor

@claude can u add these links to changelog yaml

pdate image, include pr changes:
sgl-project/sglang#21188
sgl-project/sglang#21421
sgl-project/sglang#20736

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Apr 16, 2026

Claude finished @functionstackx's task in 50s —— View job


Adding SGLang PR links to perf-changelog.yaml

  • Read current perf-changelog.yaml and PR diff
  • Add SGLang PR links to changelog entries
  • Commit and push changes

Added the three upstream SGLang PR links to the existing qwen3.5-fp8-mi355x-sglang / qwen3.5-bf16-mi355x-sglang changelog entry:

Commit: 8c51a630
|

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 perf-changelog.yaml:1362-1369 — The new perf-changelog entry has three copy-paste errors: (1) qwen3.5-fp8-mi355x-sglang is listed twice in config-keys (positions 0 and 2) while qwen3.5-bf16-mi355x-sglang appears only once — the correct list should be [qwen3.5-bf16-mi355x-sglang, qwen3.5-fp8-mi355x-sglang]; (2) pr-link points to PR #980 instead of the current PR #1036; (3) the description says lmsysorg/sglang:v0.5.10-rocm720-mi35x but the actual image in amd-master.yaml is lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260415 (different repo, RC version, date suffix). Please fix all three before merging.

    Extended reasoning...

    Bug 1 — Duplicate config-key

    The config-keys list in the new perf-changelog entry reads:

    - qwen3.5-fp8-mi355x-sglang   # index 0
    - qwen3.5-bf16-mi355x-sglang  # index 1
    - qwen3.5-fp8-mi355x-sglang   # index 2  ← duplicate

    The FP8 key is present at both index 0 and index 2; the BF16 key is present once. This is a copy-paste artifact — the entry was likely templated from the analogous MI300X/MI325X entry (PR #986) and the first or last line was not changed from FP8 to BF16. The correct list should be [qwen3.5-bf16-mi355x-sglang, qwen3.5-fp8-mi355x-sglang]. If any tooling processes config-keys to build a changelog database, verify coverage, or detect regressions, the duplicate will cause the FP8 config to be processed twice while the BF16 config is silently missing from the record.

    Bug 2 — Wrong PR link

    The entry sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/980. This PR is #1036; PR #980 is a completely different, unrelated PR that predates this one by over 50 numbers. Looking at every other entry in perf-changelog.yaml, each pr-link references the specific PR that introduced those changes — the link is the primary audit trail connecting a config change back to a code review. The analogous MI300X/MI325X entry (PR #986) was clearly used as a template and the link was not updated. The correct value is https://github.com/SemiAnalysisAI/InferenceX/pull/1036.

    Bug 3 — Inaccurate image tag in description

    The description bullet reads "Use lmsysorg/sglang:v0.5.10-rocm720-mi35x" but the image actually configured in amd-master.yaml for both qwen3.5-bf16-mi355x-sglang and qwen3.5-fp8-mi355x-sglang is lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260415. There are three discrepancies: (a) the Docker Hub repository is sglang-rocm not sglang; (b) the version is v0.5.10rc0 (a release candidate), not the stable v0.5.10; (c) the date suffix -20260415 is omitted. For comparison, the analogous MI300X/MI325X entry (PR #986) correctly documents lmsysorg/sglang:v0.5.10-rocm720-mi30x, which matches the actual image used there — the MI355X description appears to have been copied from that entry without updating for the different RC image. The PR description itself also contains this inaccuracy.

    Step-by-step proof

    1. Open perf-changelog.yaml, lines 1362–1369 (new entry added by this PR).
    2. config-keys list: fp8, bf16, fp8 — FP8 duplicated, BF16 appears once. ✗
    3. pr-link: .../pull/980 — current PR is #1036. ✗
    4. Description image: lmsysorg/sglang:v0.5.10-rocm720-mi35x.
    5. Open amd-master.yaml, qwen3.5-bf16-mi355x-sglang entry: image: lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260415. Mismatch on repo name, RC suffix, and date suffix. ✗
    6. Same image is set for qwen3.5-fp8-mi355x-sglang in the same file. ✗

    All three errors are documentation-only and have no impact on runtime behavior, but they create a misleading changelog entry that should be corrected before merge.

…X entry

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@seungrokj seungrokj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260414
Copy link
Copy Markdown
Collaborator

@chunfangamd chunfangamd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@chunfangamd chunfangamd enabled auto-merge (squash) April 16, 2026 12:42
@chunfangamd chunfangamd merged commit 86ddaed into main Apr 16, 2026
50 checks passed
@chunfangamd chunfangamd deleted the todd/qwen35-mi35x branch April 16, 2026 13:47
cquil11 added a commit that referenced this pull request Apr 17, 2026
This reverts commit 20073ba, except
for changes to benchmarks/single_node/qwen3.5_{bf16,fp8}_mi355x.sh,
which have been preserved to retain PR #1036's subsequent fixes.
cquil11 added a commit that referenced this pull request Apr 17, 2026
[skip-sweep]

This reverts commit 20073ba, except for changes to benchmarks/single_node/qwen3.5_{bf16,fp8}_mi355x.sh, which have been preserved to retain PR #1036's subsequent fixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

6 participants