[Model] Support Step-3.5-Flash by Joeegin · Pull Request #305 · baidu/vLLM-Kunlun

Joeegin · 2026-04-07T12:01:08Z

PR Description

*完善路由缩放参数支持
*完善Grouped TopK校验
*添加一个环境变量（KUNLUN_DISABLE_SMALL_MOE）,手动设置成1，禁用小规模Moe分支优化（否则会RuntimeError: CUDA error: unspecified launch failure）
*新增update_block_table方法

Doc：

Add step-3.5-Flash tutorials document

Checklist (Required)

Before submitting this PR, please ensure that all the following items are completed:

All code changes pass the pre-commit checks.
Commits are signed off using git commit -s.
The PR title is properly classified (see below).

PR Type

Please prefix the PR title with one or more of the following labels to help reviewers quickly understand the nature of the change:

[Feature] – New features or enhancements (e.g. Attention, Communicator, Kernel, Worker, etc.)
[Bugfix] – Bug fixes
[CI/Build] – CI, build system, or infrastructure improvements
[Doc] – Documentation updates or fixes
[Misc] – Other changes that do not fit the above categories (use sparingly)

Note: If the PR spans multiple categories, include all relevant prefixes.

Detailed Checklist (Click to Expand)

Thank you for contributing to vLLM Kunlun! To help us maintain high code quality and streamline the review process, please ensure your PR meets the following requirements.

1. Code Quality

All linting and formatting checks pass (pre-commit).
The code is well-structured and sufficiently documented.
The change is designed with maintainability and readability in mind.

2. Testing

Relevant unit tests are added or updated.
Integration tests are included when applicable.
Existing tests continue to pass.

3. DCO Compliance

This project follows the Developer Certificate of Origin (DCO).

All commits include a Signed-off-by: line.
Use git commit -s to automatically add the sign-off.

4. Review Expectations

During the review process, maintainers may:

Request code refactoring or additional tests.
Ask for clarifications on design decisions.
Suggest performance, stability, or maintainability improvements.

We appreciate your patience and collaboration throughout the review process!

Joeegin · 2026-04-07T12:07:11Z

Copilot

Pull request overview

Adds support and stability fixes for Step-3.5-Flash in the Kunlun backend, primarily around MoE routing/scaling and attention metadata handling.

Changes:

Add update_block_table to KunlunAttentionMetadataBuilder and enforce contiguous prefill Q/K/V slices.
Extend Kunlun fused MoE to support a router scaling factor and improve grouped-TopK parameter validation.
Add KUNLUN_DISABLE_SMALL_MOE=1 escape hatch to disable the small-MoE fast path.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`vllm_kunlun/v1/attention/backends/kunlun_attn.py`	Adds metadata update helper and fixes prefill tensor contiguity; introduces a new stdlib import.
`vllm_kunlun/ops/fused_moe/layer.py`	Plumbs routed scaling factor from the layer into Kunlun fused MoE calls.
`vllm_kunlun/ops/_kunlun_ops.py`	Adds `router_scaling_factor`, validates grouped-topk params for sigmoid routing, and adds env-var gating for small-MoE path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

xyDong0223 requested a review from Copilot April 8, 2026 08:14

Copilot started reviewing on behalf of xyDong0223 April 8, 2026 08:14 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Comment thread vllm_kunlun/v1/attention/backends/kunlun_attn.py Outdated

Comment thread vllm_kunlun/ops/_kunlun_ops.py Outdated

Joeegin force-pushed the step35 branch from 3dd6562 to 2b5322c Compare April 14, 2026 12:05

Joeegin marked this pull request as ready for review April 14, 2026 12:07

Joeegin force-pushed the step35 branch 2 times, most recently from 6e6215c to 1d08e28 Compare April 20, 2026 09:36

[Model] Support Step-3.5-Flash

d407d3e

Joeegin force-pushed the step35 branch from 1d08e28 to d407d3e Compare April 20, 2026 09:51

Joeegin closed this Apr 27, 2026

Joeegin deleted the step35 branch April 27, 2026 07:21

Joeegin restored the step35 branch April 27, 2026 07:27

Joeegin reopened this Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model] Support Step-3.5-Flash#305

[Model] Support Step-3.5-Flash#305
Joeegin wants to merge 1 commit into
baidu:mainfrom
Joeegin:step35

Joeegin commented Apr 7, 2026 •

edited

Loading

Uh oh!

Joeegin commented Apr 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Joeegin commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Description

Checklist (Required)

PR Type

1. Code Quality

2. Testing

3. DCO Compliance

4. Review Expectations

Uh oh!

Joeegin commented Apr 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Joeegin commented Apr 7, 2026 •

edited

Loading