Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions examples/megatron/lora/moe.sh
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
# 2 * 62GiB, 5.10s/it
PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \
NPROC_PER_NODE=2 \
CUDA_VISIBLE_DEVICES=0,1 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's an inconsistency between NPROC_PER_NODE (set to 2 on the preceding line) and CUDA_VISIBLE_DEVICES (set to use 4 GPUs). This configuration will only utilize 2 of the 4 specified GPUs. To make use of all available GPUs, NPROC_PER_NODE should be updated to 4.

megatron sft \
--model Qwen/Qwen3-30B-A3B \
--load_safetensors true \
--save_safetensors true \
--merge_lora false \
--dataset 'swift/Qwen3-SFT-Mixin#2000' \
'swift/self-cognition:empty_think#600' \
--dataset '/root/autodl-tmp/swift_finetune_data/qwen3_finetune_self_cognition.jsonl' \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The dataset is specified with a hardcoded absolute path (/root/...). This makes the script non-portable and dependent on a specific user's environment, which is not ideal for an example script in a shared repository. Please consider using a relative path, an environment variable, or reverting to the previous method of using dataset identifiers from a registry to ensure the script is reusable.

--loss_scale ignore_empty_think \
--train_type lora \
--lora_rank 8 \
Expand All @@ -32,8 +31,8 @@ megatron sft \
--lr_warmup_fraction 0.05 \
--min_lr 1e-5 \
--save megatron_output/Qwen3-30B-A3B \
--eval_interval 200 \
--save_interval 200 \
--eval_interval 5 \
--save_interval 5 \
Comment on lines +34 to +35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The eval_interval and save_interval have been set to a very low value of 5. While this can be useful for rapid debugging, it will cause frequent evaluations and checkpointing, introducing significant overhead and consuming disk space quickly during a normal training run. If this change is for debugging, it should not be merged. For a general-purpose example script, a more conservative value (like the previous 200) is recommended.

--max_length 2048 \
--num_workers 8 \
--dataset_num_proc 8 \
Expand All @@ -43,3 +42,4 @@ megatron sft \
--attention_backend flash \
--model_author swift \
--model_name swift-robot
--report_to wandb