[Bugfix] Route Gemma4 ClippableLinear clip buffers during weight loading by GrootLiu · Pull Request #352 · baidu/vLLM-Kunlun

GrootLiu · 2026-05-08T10:25:36Z

Gemma4ClippableLinear registers input_max/input_min/output_max/output_min as buffers rather than parameters, so AutoWeightsLoader cannot find them via named_parameters(). Intercept these weights and load them directly into the corresponding buffers before passing the remaining weights to the loader.

PR Description

Gemma4ClippableLinear 把 input_max/input_min/output_max/output_min 注册为 buffer 而非 parameter，而 AutoWeightsLoader 通过 named_parameters() 查找目标张量，导致这些 clip 边界在加载时被静默丢弃，影响精度。

在 load_weights 中拦截这些权重名，按分层路径解析出所属模块后直接 copy 到对应 buffer，其余权重原样交给 loader。

Checklist (Required)

Before submitting this PR, please ensure that all the following items are completed:

All code changes pass the pre-commit checks.
Commits are signed off using git commit -s.
The PR title is properly classified (see below).

PR Type

Please prefix the PR title with one or more of the following labels to help reviewers quickly understand the nature of the change:

[Feature] – New features or enhancements (e.g. Attention, Communicator, Kernel, Worker, etc.)
[Bugfix] – Bug fixes
[CI/Build] – CI, build system, or infrastructure improvements
[Doc] – Documentation updates or fixes
[Misc] – Other changes that do not fit the above categories (use sparingly)

Note: If the PR spans multiple categories, include all relevant prefixes.

Detailed Checklist (Click to Expand)

Thank you for contributing to vLLM Kunlun! To help us maintain high code quality and streamline the review process, please ensure your PR meets the following requirements.

1. Code Quality

All linting and formatting checks pass (pre-commit).
The code is well-structured and sufficiently documented.
The change is designed with maintainability and readability in mind.

2. Testing

Relevant unit tests are added or updated.
Integration tests are included when applicable.
Existing tests continue to pass.

3. DCO Compliance

This project follows the Developer Certificate of Origin (DCO).

All commits include a Signed-off-by: line.
Use git commit -s to automatically add the sign-off.

4. Review Expectations

During the review process, maintainers may:

Request code refactoring or additional tests.
Ask for clarifications on design decisions.
Suggest performance, stability, or maintainability improvements.

We appreciate your patience and collaboration throughout the review process!

Gemma4ClippableLinear registers input_max/input_min/output_max/output_min as buffers rather than parameters, so AutoWeightsLoader cannot find them via named_parameters(). Intercept these weights and load them directly into the corresponding buffers before passing the remaining weights to the loader. Signed-off-by: GrootLiu <1219671600@qq.com>

Copilot

Pull request overview

Fixes Gemma4 multimodal weight loading by ensuring Gemma4ClippableLinear clip-boundary tensors (registered as buffers rather than parameters) are not silently skipped during checkpoint load.

Changes:

Adds a pre-processing iterator in Gemma4ForConditionalGeneration.load_weights to intercept *.input_{min,max} / *.output_{min,max} weights.
Resolves the target submodule from the hierarchical weight name and copies tensors directly into the corresponding buffers.
Passes remaining weights through the existing AutoWeightsLoader path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+                if name.endswith(clip_suffixes):
+                    # Resolve module by hierarchical name, e.g.
+                    # audio_tower.layers.0.feed_forward1.ffw_layer_1.input_max
+                    module_path, _, buf_name = name.rpartition(".")
+                    module = self
+                    try:
+                        for attr in module_path.split("."):
+                            module = (


+                        if hasattr(module, buf_name):
+                            buf = getattr(module, buf_name)
+                            buf.data.copy_(tensor.to(buf.device, buf.dtype))
+                    except (AttributeError, IndexError):
+                        pass
+                    continue
+                yield name, tensor


+                            module = (
+                                getattr(module, attr)
+                                if not attr.isdigit()
+                                else module[int(attr)]
+                            )
+                        if hasattr(module, buf_name):
+                            buf = getattr(module, buf_name)
+                            buf.data.copy_(tensor.to(buf.device, buf.dtype))
+                    except (AttributeError, IndexError):


xyDong0223 requested a review from Copilot May 8, 2026 10:59

xyDong0223 approved these changes May 8, 2026

View reviewed changes

xyDong0223 merged commit 09869ee into baidu:main May 8, 2026
4 checks passed

Copilot started reviewing on behalf of xyDong0223 May 8, 2026 11:00 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Route Gemma4 ClippableLinear clip buffers during weight loading#352

[Bugfix] Route Gemma4 ClippableLinear clip buffers during weight loading#352
xyDong0223 merged 1 commit into
baidu:mainfrom
GrootLiu:gemma4_fix

GrootLiu commented May 8, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

GrootLiu commented May 8, 2026

PR Description

Checklist (Required)

PR Type

1. Code Quality

2. Testing

3. DCO Compliance

4. Review Expectations

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants