Skip to content

[Bugfix] Fix Qwen3.5 LoRA IndexError in packed_modules_mapping#36825

Open
hallerite wants to merge 2 commits intovllm-project:mainfrom
hallerite:fix-qwen35-lora
Open

[Bugfix] Fix Qwen3.5 LoRA IndexError in packed_modules_mapping#36825
hallerite wants to merge 2 commits intovllm-project:mainfrom
hallerite:fix-qwen35-lora

Conversation

@hallerite
Copy link
Contributor

@hallerite hallerite commented Mar 11, 2026

Summary

Fixes IndexError: list index out of range when enabling LoRA with Qwen3.5 models (Qwen3_5ForCausalLMBase and Qwen3_5ForConditionalGeneration).

Root cause: Qwen3.5's create_qkvz_proj overrides the parent (Qwen3Next) to use 4 output_sizes [key_dim, key_dim, value_dim, value_dim] for correct per-slice TP sharding. However, packed_modules_mapping only lists 2 entries ["in_proj_qkv", "in_proj_z"]. During LoRA initialization, MergedColumnParallelLinearWithLoRA sets n_slices = len(output_sizes) (4) but only creates len(packed_modules) (2) adapters, so accessing lora_a[2]/lora_a[3] crashes.

Fix:

  1. Expand packed_modules_mapping for in_proj_qkvz from 2 to 4 entries: ["in_proj_q", "in_proj_k", "in_proj_v", "in_proj_z"] — matching the 4 output_sizes
  2. Generalize MergedColumnParallelLinearWithLoRA.can_replace_layer from len(packed_modules_list) == 2 to len(packed_modules_list) == len(source_layer.output_sizes) — so it works for any N-way merged column parallel linear, not just 2-way

This works for any TP size because each of the 4 packed modules maps to one output_size, preserving correct per-slice sharding.

Note: The parent class Qwen3Next doesn't have this issue because it uses output_sizes=[sum(key_dim, key_dim, value_dim, value_dim)] (1 entry) with packed_modules=["in_proj_qkvz"] (1 entry) — they match.

Note: This may not be the globally optimal solution. The 4 packed module names (in_proj_q, in_proj_k, in_proj_v, in_proj_z) are synthetic — the actual HF weight names are in_proj_qkv (fused Q+K+V) and in_proj_z. This means LoRA adapter weights targeting the GDN projections by their real HF names wouldn't be found during loading. In practice this isn't an issue today because nobody LoRAs the GDN layers — only standard attention and MLP layers are targeted. A more complete fix would be to support M packed modules mapping to N output sizes (2 weights → 4 sharding slices) in MergedColumnParallelLinearWithLoRA, but that's a larger refactor.

Related: #36372, #36478

Test plan

  • Verified LoRA training (TP=1) completes successfully with Qwen3.5-9B on 2x RTX PRO 6000 Blackwell GPUs using prime-rl
  • Test with TP=2 and TP=4

@mergify mergify bot added qwen Related to Qwen models bug Something isn't working labels Mar 11, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an IndexError that occurs when using LoRA with Qwen3.5 models. The fix is two-fold: first, it correctly aligns the packed_modules_mapping for in_proj_qkvz in Qwen3.5 models to have four entries, which now matches the layer's four output_sizes. Second, it generalizes the layer replacement logic in MergedColumnParallelLinearWithLoRA to be more robust by dynamically checking against the number of output sizes instead of a hardcoded value. These changes are well-reasoned, directly fix the bug, and improve the code's maintainability.

@mergify
Copy link

mergify bot commented Mar 11, 2026

Hi @hallerite, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Copy link
Contributor

@alvinttang alvinttang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a correct two-part fix: the packed_modules_mapping was misrepresenting in_proj_qkvz as 2 sub-modules when it actually has 4 (matching the 4 output_sizes in create_qkvz_proj), and can_replace_layer was hardcoded to len == 2 instead of dynamically checking against the layer's actual output_sizes. The dynamic check in can_replace_layer is the more important improvement since it makes the validation self-consistent and prevents future regressions if the packing changes again. One thing worth double-checking: are there any serialized/saved LoRA adapters in the wild that used the old 2-module mapping that would now silently fail to load against this updated definition? Overall this is a well-reasoned fix and both changes are necessary together.

Signed-off-by: hallerite <git@hallerite.com>
Signed-off-by: hallerite <git@hallerite.com>
@devlup
Copy link

devlup commented Mar 12, 2026

this has unblocked the seq len error but stopped working for me when i load lora with 4bit quant bitsandbytes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working qwen Related to Qwen models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants