-
Notifications
You must be signed in to change notification settings - Fork 35
Expand file tree
/
Copy pathrun.sh
More file actions
executable file
·100 lines (96 loc) · 2.96 KB
/
run.sh
File metadata and controls
executable file
·100 lines (96 loc) · 2.96 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
#!/bin/bash
################################################################################
# Qwen2.5-VL Training with FSDP2 + Ulysses Sequence Parallel
################################################################################
#
# DESCRIPTION:
# Train Qwen2.5-VL vision-language model with support for long sequences
# using Ulysses Sequence Parallel and FSDP2 distributed training.
#
# KEY FEATURES:
# - Multi-resolution visual understanding
# - Ulysses SP for 10K+ visual tokens
# - Flash Attention 2 + unpadding (use_rmpad)
# - Sequence packing support
# - Liger Kernel fused operations
# - FSDP2 distributed training
#
# REQUIREMENTS:
# - 8x GPUs (A100/H100 recommended, 80GB VRAM)
# - flash-attn: pip install flash-attn --no-build-isolation
# - liger-kernel: pip install liger-kernel
#
# DATASET:
# Prepare your dataset in OpenAI chat format (JSONL/Arrow/Parquet):
# See: docs/user_guide/data_prep.md
#
# Example dataset entry:
# ```json
# {
# "messages": [
# {
# "role": "user",
# "content": [
# {"type": "image", "image": "path/to/image.jpg"},
# {"type": "text", "text": "Describe this image"}
# ]
# },
# {
# "role": "assistant",
# "content": [{"type": "text", "text": "This image shows..."}]
# }
# ]
# }
# ```
#
# CONFIGURATION:
# Edit example_config.yaml to customize:
# - Model size (3B/7B/72B): change load_from_pretrained_path
# - Sequence length: adjust packing_length
# - SP degree: set sp_ulysses_degree (1/2/4/8)
# - Batch size: per_device_train_batch_size
# - Max frames: video_max_frames
#
# PERFORMANCE TIPS:
# - Adjust sp_ulysses_degree based on sequence length:
# * Degree 1: < 10K tokens
# * Degree 2: 10K-20K tokens
# * Degree 4: 20K-40K tokens
# * Degree 8: 40K+ tokens
# - Enable packing for better MFU: set packing: true
# - Use gradient_checkpointing for larger models (already enabled)
# - Monitor memory with: watch -n 1 nvidia-smi
#
################################################################################
# Number of GPUs
NGPUS=8
# Training command
torchrun --nproc_per_node=${NGPUS} \
--nnodes=1 \
--node_rank=0 \
--master_addr=127.0.0.1 \
--master_port=12356 \
-m lmms_engine.launch.cli \
config_yaml=examples/qwen2_5_vl/example_config.yaml
################################################################################
# MULTI-NODE TRAINING:
#
# On rank 0 node:
# torchrun --nproc_per_node=8 \
# --nnodes=2 \
# --node_rank=0 \
# --master_addr=<RANK_0_IP> \
# --master_port=12356 \
# -m lmms_engine.launch.cli \
# config_yaml=examples/qwen2_5_vl/example_config.yaml
#
# On rank 1 node:
# torchrun --nproc_per_node=8 \
# --nnodes=2 \
# --node_rank=1 \
# --master_addr=<RANK_0_IP> \
# --master_port=12356 \
# -m lmms_engine.launch.cli \
# config_yaml=examples/qwen2_5_vl/example_config.yaml
#
################################################################################