Skip to content

Does LORA works with Qwen3.5, when using the Megatron backend? #372

@shamanez

Description

@shamanez

I do get initially errors.

  • Hybrid architecture compatibility with mcore_adapter: Qwen3.5's hybrid architecture (full attention every 4 layers + GDN linear attention + Mamba SSM) is non-standard. The apply_megatron_lora() function in mcore_adapter was designed for standard transformer models (Qwen2.5). The GDN and Mamba layers may not be properly recognized or adapted.

  • all-linear expansion: When lora_target: all-linear is used, find_all_linear_modules() auto-discovers linear layers. For Qwen3.5's hybrid layers (GDN projections like in_proj_qkv, in_proj_z, in_proj_b, in_proj_a), it's unclear if these get correctly identified and wrapped with LoRA adapters by the Megatron backend.

  • VLM wrapper: Qwen3.5 loads as Qwen3_5ForConditionalGeneration (VLM) — LoRA needs to target only the text model, not the vision encoder. We use freeze_module_prefix: vision_model but need to verify this interacts correctly with the LoRA setup path

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions