Skip to content

[Model][Bugfix] v0.15.1-dev Remove model and bugfix#340

Closed
Joeegin wants to merge 2 commits into
baidu:mainfrom
Joeegin:main
Closed

[Model][Bugfix] v0.15.1-dev Remove model and bugfix#340
Joeegin wants to merge 2 commits into
baidu:mainfrom
Joeegin:main

Conversation

@Joeegin

@Joeegin Joeegin commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

PR Description


Checklist (Required)

Before submitting this PR, please ensure that all the following items are completed:

  • All code changes pass the pre-commit checks.
  • Commits are signed off using git commit -s.
  • The PR title is properly classified (see below).

PR Type

Please prefix the PR title with one or more of the following labels to help reviewers quickly understand the nature of the change:

  • [Feature] – New features or enhancements (e.g. Attention, Communicator, Kernel, Worker, etc.)
  • [Bugfix] – Bug fixes
  • [CI/Build] – CI, build system, or infrastructure improvements
  • [Doc] – Documentation updates or fixes
  • [Misc] – Other changes that do not fit the above categories (use sparingly)

Note: If the PR spans multiple categories, include all relevant prefixes.


Detailed Checklist (Click to Expand)

Thank you for contributing to vLLM Kunlun! To help us maintain high code quality and streamline the review process, please ensure your PR meets the following requirements.

1. Code Quality

  • All linting and formatting checks pass (pre-commit).
  • The code is well-structured and sufficiently documented.
  • The change is designed with maintainability and readability in mind.

2. Testing

  • Relevant unit tests are added or updated.
  • Integration tests are included when applicable.
  • Existing tests continue to pass.

3. DCO Compliance

This project follows the Developer Certificate of Origin (DCO).

  • All commits include a Signed-off-by: line.
  • Use git commit -s to automatically add the sign-off.

4. Review Expectations

During the review process, maintainers may:

  • Request code refactoring or additional tests.
  • Ask for clarifications on design decisions.
  • Suggest performance, stability, or maintainability improvements.

We appreciate your patience and collaboration throughout the review process!

Joeegin added 2 commits April 27, 2026 15:35
Signed-off-by: Joeegin <3318329726@qq.com>
… switch

Signed-off-by: Joeegin <3318329726@qq.com>
@Joeegin

Joeegin commented Apr 27, 2026

Copy link
Copy Markdown
Contributor Author

首先需要说明的一点是,v0.15.1版本vllm更新的时候还没有出Qwen3.5,所以transformers版本比较低,但是vllm-kunlun的0.15.1dev分支为了适配Qwen3.5,所以容器镜像环境是transofoerms==5.2.0,这里如果要运行这一系列老模型的话,建议降级transformers,这里我使用的是transformers==4.57.5

我会在0.15.1分支的文档中再单独说明


Qwen3-VL-32B(TP2)
需要添加TORCHDYNAMO_DISABLE=1,以绕过triton算子,后端可以正常走kunlun_graph,服务正常启动
image
客户端测试:
image
image

@Joeegin

Joeegin commented Apr 27, 2026

Copy link
Copy Markdown
Contributor Author

Qwen3-VL-30B-A3B(TP2)
同上,需要绕过triton
image
客户端测试
image
image

@Joeegin

Joeegin commented Apr 27, 2026

Copy link
Copy Markdown
Contributor Author

Qwen2.5-VL-32B(TP2)
绕过triton
image
客户端测试
image
image

@Joeegin

Joeegin commented Apr 27, 2026

Copy link
Copy Markdown
Contributor Author

InternVL3_5-30B-A3B(TP2)
image
客户端测试
建议添加上采样参数
image
image

@Joeegin

Joeegin commented Apr 27, 2026

Copy link
Copy Markdown
Contributor Author

InternVL3_5-241B-A28B(TP8)
需要绕过small_moe分支(运行Qwen3-235B-A22B同样需要开启)
image
客户端测试
image
image

@Joeegin

Joeegin commented Apr 27, 2026

Copy link
Copy Markdown
Contributor Author

Intern-S1(TP8)
image
客户端测试
建议添加采样参数
image
image

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes several Kunlun-local model implementations/registrations and adds a runtime switch to disable the “small MoE” fast-path in the Kunlun fused MoE op.

Changes:

  • Add KUNLUN_DISABLE_SMALL_MOE environment flag to bypass the small-token MoE optimization path in fused_moe.
  • Remove redundant model implementation files from vllm_kunlun/models/ (Qwen VL variants, Intern* variants, InternLM2).
  • Update vllm_kunlun/models/__init__.py to stop registering the removed models.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
vllm_kunlun/ops/_kunlun_ops.py Adds env-controlled gate for the small MoE pre-processing fast-path.
vllm_kunlun/models/init.py Removes registry entries for models whose implementations are being deleted.
vllm_kunlun/models/qwen3_vl_moe.py Deleted redundant Kunlun-local Qwen3-VL-MoE model implementation.
vllm_kunlun/models/qwen3_vl.py Deleted redundant Kunlun-local Qwen3-VL model implementation.
vllm_kunlun/models/qwen2_vl.py Deleted redundant Kunlun-local Qwen2-VL model implementation.
vllm_kunlun/models/qwen2_5_vl.py Deleted redundant Kunlun-local Qwen2.5-VL model implementation.
vllm_kunlun/models/internvl.py Deleted Kunlun-local InternVL implementation.
vllm_kunlun/models/interns1_vit.py Deleted Kunlun-local InternS1 ViT implementation.
vllm_kunlun/models/interns1.py Deleted Kunlun-local InternS1 multimodal model implementation.
vllm_kunlun/models/internlm2.py Deleted Kunlun-local InternLM2 implementation.
vllm_kunlun/models/intern_vit.py Deleted Kunlun-local Intern ViT implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vllm_kunlun/ops/_kunlun_ops.py
Comment thread vllm_kunlun/models/__init__.py
@Joeegin

Joeegin commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #371. Moving the head branch off Joeegin:main so Joeegin:main can be synchronized with upstream main without losing the unmerged changes.

@Joeegin Joeegin closed this Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants