[Bugfix] Route Gemma4 ClippableLinear clip buffers during weight loading#352
Merged
Conversation
Gemma4ClippableLinear registers input_max/input_min/output_max/output_min as buffers rather than parameters, so AutoWeightsLoader cannot find them via named_parameters(). Intercept these weights and load them directly into the corresponding buffers before passing the remaining weights to the loader. Signed-off-by: GrootLiu <1219671600@qq.com>
xyDong0223
approved these changes
May 8, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes Gemma4 multimodal weight loading by ensuring Gemma4ClippableLinear clip-boundary tensors (registered as buffers rather than parameters) are not silently skipped during checkpoint load.
Changes:
- Adds a pre-processing iterator in
Gemma4ForConditionalGeneration.load_weightsto intercept*.input_{min,max}/*.output_{min,max}weights. - Resolves the target submodule from the hierarchical weight name and copies tensors directly into the corresponding buffers.
- Passes remaining weights through the existing
AutoWeightsLoaderpath.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1376
to
+1383
| if name.endswith(clip_suffixes): | ||
| # Resolve module by hierarchical name, e.g. | ||
| # audio_tower.layers.0.feed_forward1.ffw_layer_1.input_max | ||
| module_path, _, buf_name = name.rpartition(".") | ||
| module = self | ||
| try: | ||
| for attr in module_path.split("."): | ||
| module = ( |
Comment on lines
+1388
to
+1394
| if hasattr(module, buf_name): | ||
| buf = getattr(module, buf_name) | ||
| buf.data.copy_(tensor.to(buf.device, buf.dtype)) | ||
| except (AttributeError, IndexError): | ||
| pass | ||
| continue | ||
| yield name, tensor |
Comment on lines
+1383
to
+1391
| module = ( | ||
| getattr(module, attr) | ||
| if not attr.isdigit() | ||
| else module[int(attr)] | ||
| ) | ||
| if hasattr(module, buf_name): | ||
| buf = getattr(module, buf_name) | ||
| buf.data.copy_(tensor.to(buf.device, buf.dtype)) | ||
| except (AttributeError, IndexError): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Gemma4ClippableLinear registers input_max/input_min/output_max/output_min as buffers rather than parameters, so AutoWeightsLoader cannot find them via named_parameters(). Intercept these weights and load them directly into the corresponding buffers before passing the remaining weights to the loader.
PR Description
Gemma4ClippableLinear 把 input_max/input_min/output_max/output_min 注册为 buffer 而非 parameter,而 AutoWeightsLoader 通过 named_parameters() 查找目标张量,导致这些 clip 边界在加载时被静默丢弃,影响精度。
在 load_weights 中拦截这些权重名,按分层路径解析出所属模块后直接 copy 到对应 buffer,其余权重原样交给 loader。
Checklist (Required)
Before submitting this PR, please ensure that all the following items are completed:
pre-commitchecks.git commit -s.PR Type
Please prefix the PR title with one or more of the following labels to help reviewers quickly understand the nature of the change:
[Feature]– New features or enhancements (e.g. Attention, Communicator, Kernel, Worker, etc.)[Bugfix]– Bug fixes[CI/Build]– CI, build system, or infrastructure improvements[Doc]– Documentation updates or fixes[Misc]– Other changes that do not fit the above categories (use sparingly)Detailed Checklist (Click to Expand)
Thank you for contributing to vLLM Kunlun! To help us maintain high code quality and streamline the review process, please ensure your PR meets the following requirements.
1. Code Quality
pre-commit).2. Testing
3. DCO Compliance
This project follows the Developer Certificate of Origin (DCO).
Signed-off-by:line.git commit -sto automatically add the sign-off.4. Review Expectations
During the review process, maintainers may:
We appreciate your patience and collaboration throughout the review process!