Skip to content

[Model] Fix MiniCPM-V 4.6 vit_merger qkv weight loading#43213

Open
tc-mb wants to merge 1 commit into
vllm-project:mainfrom
tc-mb:fix/minicpmv4_6-vit-merger-qkv-loading
Open

[Model] Fix MiniCPM-V 4.6 vit_merger qkv weight loading#43213
tc-mb wants to merge 1 commit into
vllm-project:mainfrom
tc-mb:fix/minicpmv4_6-vit-merger-qkv-loading

Conversation

@tc-mb
Copy link
Copy Markdown
Contributor

@tc-mb tc-mb commented May 20, 2026

In MiniCPMV4_6 (vllm/model_executor/models/minicpmv4_6.py), the vit_merger.self_attn module uses vLLM's fused QKVParallelLinear (a single qkv_proj), while the HuggingFace checkpoint stores the projections as three separate tensors: q_proj, k_proj, and v_proj.

Although the top-level model already declares:

packed_modules_mapping = {
"qkv_proj": ["q_proj", "k_proj", "v_proj"],
...
}
AutoWeightsLoader does not correctly stack the three shards into qkv_proj for the vit_merger sub-tree. As a result, vit_merger.self_attn.qkv_proj is never populated with the pretrained weights — loading either raises missing/unexpected-key errors for vit_merger.self_attn.{q,k,v}_proj.*, or silently leaves qkv_proj at its random init, producing corrupted vision features.

Fix: add a dedicated load_weights method on MiniCPMV4_6ViTWindowAttentionSelfAttn that follows vLLM's standard stacked_params_mapping pattern — feeding q_proj / k_proj / v_proj into qkv_proj.weight_loader with shard_id="q"/"k"/"v" respectively, and falling back to default_weight_loader for the remaining parameters (e.g. out_proj).

Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a load_weights method in the MiniCPMV4_6 model to handle weight loading, specifically for stacked QKV projections. The review feedback identifies a potential crash due to missing key validation when encountering unexpected keys in the checkpoint and points out that the method should return original weight names instead of transformed ones to prevent the model loader from incorrectly reporting unexpected keys.

Comment thread vllm/model_executor/models/minicpmv4_6.py
@tc-mb tc-mb force-pushed the fix/minicpmv4_6-vit-merger-qkv-loading branch from a17a283 to 2992c66 Compare May 20, 2026 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant