Skip to content

Commit f80b955

Browse files
authored
Megatron VLM Support (Qwen2.5-VL series) (3/N) (#1210)
1 parent 03184cf commit f80b955

File tree

3 files changed

+14
-0
lines changed

3 files changed

+14
-0
lines changed

examples/geo3k_vlm/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22

33
Training VLMs with FSDP or Megatron on single-turn reasoning task using GRPO on the [GEO3K dataset](https://huggingface.co/datasets/hiyouga/geometry3k). We used processed version [here](https://huggingface.co/datasets/chenhegu/geo3k_imgurl).
44

5+
Supported models:
6+
* Qwen2.5-VL
7+
* Qwen3-VL (Dense and Moe)
8+
59
Note: Please make sure the cudnn version in the environment is 9.16.0.29 to prevent severe performance regression in conv3d in torch 2.9 mentioned in https://github.com/pytorch/pytorch/issues/168167. Otherwise, you can reinstall cudnn with:
610
```bash
711
pip install nvidia-cudnn-cu12==9.16.0.29

examples/geo3k_vlm/run_geo3k_vlm.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,10 @@ DATASET_LOCAL_NAME=$(basename "$DATASET_NAME")
1515

1616
# Validate MODEL_NAME
1717
VALID_MODELS="
18+
Qwen2.5-VL-3B-Instruct
19+
Qwen2.5-VL-7B-Instruct
20+
Qwen2.5-VL-32B-Instruct
21+
Qwen2.5-VL-72B-Instruct
1822
Qwen3-VL-2B-Instruct
1923
Qwen3-VL-4B-Instruct
2024
Qwen3-VL-8B-Instruct
@@ -80,6 +84,8 @@ fi
8084
# Common args
8185
CKPT_ARGS=(
8286
--hf-checkpoint /root/models/${MODEL_NAME}
87+
# qwen3 vl model has rotary base 5000000, set it when applicable
88+
--rotary-base 5000000
8389
)
8490

8591
ROLLOUT_ARGS=(

examples/geo3k_vlm/run_geo3k_vlm_sft.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@ DATASET_LOCAL_NAME=$(basename "$DATASET_NAME")
66

77
# Validate MODEL_NAME
88
VALID_MODELS="
9+
Qwen2.5-VL-3B-Instruct
10+
Qwen2.5-VL-7B-Instruct
11+
Qwen2.5-VL-32B-Instruct
12+
Qwen2.5-VL-72B-Instruct
913
Qwen3-VL-2B-Instruct
1014
Qwen3-VL-4B-Instruct
1115
Qwen3-VL-8B-Instruct

0 commit comments

Comments
 (0)