[Bugfix] Fix Qwen3.5 LoRA IndexError in packed_modules_mapping#36825
[Bugfix] Fix Qwen3.5 LoRA IndexError in packed_modules_mapping#36825hallerite wants to merge 2 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses an IndexError that occurs when using LoRA with Qwen3.5 models. The fix is two-fold: first, it correctly aligns the packed_modules_mapping for in_proj_qkvz in Qwen3.5 models to have four entries, which now matches the layer's four output_sizes. Second, it generalizes the layer replacement logic in MergedColumnParallelLinearWithLoRA to be more robust by dynamically checking against the number of output sizes instead of a hardcoded value. These changes are well-reasoned, directly fix the bug, and improve the code's maintainability.
d179859 to
e38dbbd
Compare
|
Hi @hallerite, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
alvinttang
left a comment
There was a problem hiding this comment.
This is a correct two-part fix: the packed_modules_mapping was misrepresenting in_proj_qkvz as 2 sub-modules when it actually has 4 (matching the 4 output_sizes in create_qkvz_proj), and can_replace_layer was hardcoded to len == 2 instead of dynamically checking against the layer's actual output_sizes. The dynamic check in can_replace_layer is the more important improvement since it makes the validation self-consistent and prevents future regressions if the packing changes again. One thing worth double-checking: are there any serialized/saved LoRA adapters in the wild that used the old 2-module mapping that would now silently fail to load against this updated definition? Overall this is a well-reasoned fix and both changes are necessary together.
Signed-off-by: hallerite <git@hallerite.com>
Signed-off-by: hallerite <git@hallerite.com>
27fb13a to
67a242a
Compare
|
this has unblocked the seq len error but stopped working for me when i load lora with 4bit quant bitsandbytes |
Summary
Fixes
IndexError: list index out of rangewhen enabling LoRA with Qwen3.5 models (Qwen3_5ForCausalLMBaseandQwen3_5ForConditionalGeneration).Root cause: Qwen3.5's
create_qkvz_projoverrides the parent (Qwen3Next) to use 4output_sizes[key_dim, key_dim, value_dim, value_dim]for correct per-slice TP sharding. However,packed_modules_mappingonly lists 2 entries["in_proj_qkv", "in_proj_z"]. During LoRA initialization,MergedColumnParallelLinearWithLoRAsetsn_slices = len(output_sizes)(4) but only createslen(packed_modules)(2) adapters, so accessinglora_a[2]/lora_a[3]crashes.Fix:
packed_modules_mappingforin_proj_qkvzfrom 2 to 4 entries:["in_proj_q", "in_proj_k", "in_proj_v", "in_proj_z"]— matching the 4 output_sizesMergedColumnParallelLinearWithLoRA.can_replace_layerfromlen(packed_modules_list) == 2tolen(packed_modules_list) == len(source_layer.output_sizes)— so it works for any N-way merged column parallel linear, not just 2-wayThis works for any TP size because each of the 4 packed modules maps to one output_size, preserving correct per-slice sharding.
Note: The parent class
Qwen3Nextdoesn't have this issue because it usesoutput_sizes=[sum(key_dim, key_dim, value_dim, value_dim)](1 entry) withpacked_modules=["in_proj_qkvz"](1 entry) — they match.Note: This may not be the globally optimal solution. The 4 packed module names (
in_proj_q,in_proj_k,in_proj_v,in_proj_z) are synthetic — the actual HF weight names arein_proj_qkv(fused Q+K+V) andin_proj_z. This means LoRA adapter weights targeting the GDN projections by their real HF names wouldn't be found during loading. In practice this isn't an issue today because nobody LoRAs the GDN layers — only standard attention and MLP layers are targeted. A more complete fix would be to support M packed modules mapping to N output sizes (2 weights → 4 sharding slices) inMergedColumnParallelLinearWithLoRA, but that's a larger refactor.Related: #36372, #36478
Test plan