-
Notifications
You must be signed in to change notification settings - Fork 47
Add Qwen3VL support (dense and moe) #1174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
/ok to test c061222 |
|
|
||
| def process_inputs(tokenizer, processor, image_path: Optional[str], prompt: str, is_vl_model: bool): | ||
| def pad_input_ids_to_tp_multiple(input_ids, tp_size: int, pad_token_id: int = 0): | ||
| """Pad input_ids so sequence length is divisible by tp_size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this is required when sequence parallel is on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| Loading pretrained weights (recommended for finetune): | ||
| 1) Import HF checkpoint to Megatron format: | ||
| $ python examples/conversion/convert_checkpoints.py import \ | ||
| $ torchrun --nproc_per_node=1 examples/conversion/convert_checkpoints.py import \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to change this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the process hangs if I dont use torchrun, so updated
| @@ -0,0 +1,213 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we merge the 2 finetune vl scripts? you can rename one of them and remove the other one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, just have one common script now
| @@ -0,0 +1,181 @@ | |||
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated w/ qwen3_vl bridge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed duplicate
| ) | ||
|
|
||
| # rebuild the transformer block | ||
| self.decoder = Qwen3VLTransformerBlock( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there shouldn't be a rebuit, should just update layerspec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/models/gpt/gpt_model.py#L202
I think the layer spec override is for TransformerLayer, not for TransformerBlock. We might still need to override?
| ) | ||
|
|
||
| # rebuild the transformer block | ||
| self.decoder = Qwen3VLTransformerBlock( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't need rebuild blocks. It's extra overhead, just update the layer spec to use Qwen3VLTransformerBlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
| visual_pos_masks: Optional[torch.Tensor] = None, | ||
| deepstack_visual_embeds: Optional[list[torch.Tensor]] = None, | ||
| ) -> Tensor: | ||
| """Forward function of the GPT Model This function passes the input tensors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add in comment why this need to be overriden - deepstack_visual_embeds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/bridge.py
Outdated
Show resolved
Hide resolved
| @@ -0,0 +1,179 @@ | |||
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this file to model.py as well? seems single file is enough easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer to have shorter self contained files for easier maintenance than one long file, can change it if have strong preference
| @@ -0,0 +1,154 @@ | |||
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated with qwen3vl_provider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the duplicate
| pass | ||
|
|
||
|
|
||
| def extract_expert_number_from_param(param_name: str) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import this from src/megatron/bridge/utils/common_utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
/ok to test 3e7f918 |
|
@yaoyu-33 addressed all the comments, ptal when you get chance |
Signed-off-by: ykarnati <[email protected]>
|
/ok to test 69233a5 |
Qwen3VL Verification
Dense Model (8B)
Model:
Qwen/Qwen3-VL-8B-InstructHF Logits Matching
Megatron Top 5:
HF Top 5:
Fine-tuning on cord-v2 Dataset
Train vs validation loss curves
WandB Link: Qwen3VL 8B Fine-tune Run
Inference on Sample of cord-v2 Dataset
Command:
Before Fine-tune:
After Fine-tune:
MoE Model (30B - A3B)
Model:
Qwen/Qwen3-VL-30B-A3B-InstructWandB Link: Qwen3VL MoE Fine-tune Run
HF Logits Matching
Command:
Megatron Top 5:
HF Top 5:
Cosine Similarity:
0.973547Fine-tuning on cord-v2 Dataset
Train vs validation loss curves

Inference on Sample of cord-v2 Dataset
Command:
Before Fine-tune:
After Fine-tune: