[Model][Bugfix] v0.15.1-dev Remove model and bugfix by Joeegin · Pull Request #340 · baidu/vLLM-Kunlun

Joeegin · 2026-04-27T07:42:24Z

PR Description

Remove redundant model
bugfix AttributeError: 'Qwen3VLTextConfig' object has no attribute 'tie_word_embeddings' #306
Introduce KUNLUN_DISABLE_SMALL_MOE env to disable small MoE optimization

Checklist (Required)

Before submitting this PR, please ensure that all the following items are completed:

All code changes pass the pre-commit checks.
Commits are signed off using git commit -s.
The PR title is properly classified (see below).

PR Type

Please prefix the PR title with one or more of the following labels to help reviewers quickly understand the nature of the change:

[Feature] – New features or enhancements (e.g. Attention, Communicator, Kernel, Worker, etc.)
[Bugfix] – Bug fixes
[CI/Build] – CI, build system, or infrastructure improvements
[Doc] – Documentation updates or fixes
[Misc] – Other changes that do not fit the above categories (use sparingly)

Note: If the PR spans multiple categories, include all relevant prefixes.

Detailed Checklist (Click to Expand)

Thank you for contributing to vLLM Kunlun! To help us maintain high code quality and streamline the review process, please ensure your PR meets the following requirements.

1. Code Quality

All linting and formatting checks pass (pre-commit).
The code is well-structured and sufficiently documented.
The change is designed with maintainability and readability in mind.

2. Testing

Relevant unit tests are added or updated.
Integration tests are included when applicable.
Existing tests continue to pass.

3. DCO Compliance

This project follows the Developer Certificate of Origin (DCO).

All commits include a Signed-off-by: line.
Use git commit -s to automatically add the sign-off.

4. Review Expectations

During the review process, maintainers may:

Request code refactoring or additional tests.
Ask for clarifications on design decisions.
Suggest performance, stability, or maintainability improvements.

We appreciate your patience and collaboration throughout the review process!

Signed-off-by: Joeegin <3318329726@qq.com>

… switch Signed-off-by: Joeegin <3318329726@qq.com>

Joeegin · 2026-04-27T07:51:55Z

首先需要说明的一点是，v0.15.1版本vllm更新的时候还没有出Qwen3.5，所以transformers版本比较低，但是vllm-kunlun的0.15.1dev分支为了适配Qwen3.5,所以容器镜像环境是transofoerms==5.2.0，这里如果要运行这一系列老模型的话，建议降级transformers，这里我使用的是transformers==4.57.5

我会在0.15.1分支的文档中再单独说明

Qwen3-VL-32B(TP2)
需要添加TORCHDYNAMO_DISABLE=1，以绕过triton算子，后端可以正常走kunlun_graph，服务正常启动

客户端测试：

Joeegin · 2026-04-27T07:58:37Z

Qwen3-VL-30B-A3B(TP2)
同上，需要绕过triton

客户端测试

Joeegin · 2026-04-27T08:07:39Z

Qwen2.5-VL-32B(TP2)
绕过triton

客户端测试

Joeegin · 2026-04-27T08:37:10Z

InternVL3_5-30B-A3B(TP2)

客户端测试
建议添加上采样参数

Joeegin · 2026-04-27T08:42:26Z

InternVL3_5-241B-A28B(TP8)
需要绕过small_moe分支(运行Qwen3-235B-A22B同样需要开启)

客户端测试

Joeegin · 2026-04-27T08:53:17Z

Intern-S1(TP8)

客户端测试
建议添加采样参数

Copilot

Pull request overview

This PR removes several Kunlun-local model implementations/registrations and adds a runtime switch to disable the “small MoE” fast-path in the Kunlun fused MoE op.

Changes:

Add KUNLUN_DISABLE_SMALL_MOE environment flag to bypass the small-token MoE optimization path in fused_moe.
Remove redundant model implementation files from vllm_kunlun/models/ (Qwen VL variants, Intern* variants, InternLM2).
Update vllm_kunlun/models/__init__.py to stop registering the removed models.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
vllm_kunlun/ops/_kunlun_ops.py	Adds env-controlled gate for the small MoE pre-processing fast-path.
vllm_kunlun/models/init.py	Removes registry entries for models whose implementations are being deleted.
vllm_kunlun/models/qwen3_vl_moe.py	Deleted redundant Kunlun-local Qwen3-VL-MoE model implementation.
vllm_kunlun/models/qwen3_vl.py	Deleted redundant Kunlun-local Qwen3-VL model implementation.
vllm_kunlun/models/qwen2_vl.py	Deleted redundant Kunlun-local Qwen2-VL model implementation.
vllm_kunlun/models/qwen2_5_vl.py	Deleted redundant Kunlun-local Qwen2.5-VL model implementation.
vllm_kunlun/models/internvl.py	Deleted Kunlun-local InternVL implementation.
vllm_kunlun/models/interns1_vit.py	Deleted Kunlun-local InternS1 ViT implementation.
vllm_kunlun/models/interns1.py	Deleted Kunlun-local InternS1 multimodal model implementation.
vllm_kunlun/models/internlm2.py	Deleted Kunlun-local InternLM2 implementation.
vllm_kunlun/models/intern_vit.py	Deleted Kunlun-local Intern ViT implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Joeegin · 2026-06-01T07:44:58Z

Superseded by #371. Moving the head branch off Joeegin:main so Joeegin:main can be synchronized with upstream main without losing the unmerged changes.

Joeegin added 2 commits April 27, 2026 15:35

[Model] Remove redundant model

5cdbe4e

Signed-off-by: Joeegin <3318329726@qq.com>

[Bugfix] InternVL-241B-A28B(Moe):guard small MoE path and add disable…

a12e430

… switch Signed-off-by: Joeegin <3318329726@qq.com>

xyDong0223 requested a review from Copilot April 29, 2026 03:33

Copilot started reviewing on behalf of xyDong0223 April 29, 2026 03:33 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

Comment thread vllm_kunlun/ops/_kunlun_ops.py

Comment thread vllm_kunlun/models/__init__.py

Joeegin mentioned this pull request Jun 1, 2026

[Model][Bugfix] v0.15.1-dev Remove model and bugfix #371

Open

3 tasks

Joeegin closed this Jun 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model][Bugfix] v0.15.1-dev Remove model and bugfix#340

[Model][Bugfix] v0.15.1-dev Remove model and bugfix#340
Joeegin wants to merge 2 commits into
baidu:mainfrom
Joeegin:main

Joeegin commented Apr 27, 2026 •

edited

Loading

Uh oh!

Joeegin commented Apr 27, 2026 •

edited

Loading

Uh oh!

Joeegin commented Apr 27, 2026

Uh oh!

Joeegin commented Apr 27, 2026

Uh oh!

Joeegin commented Apr 27, 2026

Uh oh!

Joeegin commented Apr 27, 2026

Uh oh!

Joeegin commented Apr 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Joeegin commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Joeegin commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Description

Checklist (Required)

PR Type

1. Code Quality

2. Testing

3. DCO Compliance

4. Review Expectations

Uh oh!

Joeegin commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Joeegin commented Apr 27, 2026

Uh oh!

Joeegin commented Apr 27, 2026

Uh oh!

Joeegin commented Apr 27, 2026

Uh oh!

Joeegin commented Apr 27, 2026

Uh oh!

Joeegin commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Joeegin commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Joeegin commented Apr 27, 2026 •

edited

Loading

Joeegin commented Apr 27, 2026 •

edited

Loading

Joeegin commented Apr 27, 2026 •

edited

Loading