-
Notifications
You must be signed in to change notification settings - Fork 88
Open
Labels
Description
Currently, the tutorial call neuron_parallel_compile inside of the bash script. Because neuron_parallel_compile is responsible for setting $NEURON_EXTRACT_GRAPHS_ONLY, this causes the MAX_STEPS set to -1, causing compilation to run for >1 hour.
if [ "$NEURON_EXTRACT_GRAPHS_ONLY" = "1" ]; then
MAX_STEPS=$((LOGGING_STEPS + 5))
else
MAX_STEPS=-1
fi
optimum-neuron/docs/source/training_tutorials/sft_lora_finetune_llm.mdx
Lines 215 to 262 in 3748a06
| ```bash | |
| #!/bin/bash | |
| set -ex | |
| export NEURON_FUSE_SOFTMAX=1 | |
| export NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=3 | |
| export MALLOC_ARENA_MAX=64 | |
| export NEURON_CC_FLAGS="--model-type=transformer --distribution-strategy=llm-training --enable-saturate-infinity --cache_dir=/home/ubuntu/cache_dir_neuron/" | |
| PROCESSES_PER_NODE=8 | |
| NUM_EPOCHS=1 | |
| TP_DEGREE=2 | |
| PP_DEGREE=1 | |
| BS=1 | |
| GRADIENT_ACCUMULATION_STEPS=8 | |
| LOGGING_STEPS=1 | |
| MODEL_NAME="meta-llama/Meta-Llama-3-8B" | |
| OUTPUT_DIR=output-$SLURM_JOB_ID | |
| if [ "$NEURON_EXTRACT_GRAPHS_ONLY" = "1" ]; then | |
| MAX_STEPS=$((LOGGING_STEPS + 5)) | |
| else | |
| MAX_STEPS=-1 | |
| fi | |
| XLA_USE_BF16=1 neuron_parallel_compile torchrun --nproc_per_node $PROCESSES_PER_NODE docs/source/training_tutorials/sft_lora_finetune_llm.py \ | |
| --model_id $MODEL_NAME \ | |
| --num_train_epochs $NUM_EPOCHS \ | |
| --do_train \ | |
| --learning_rate 5e-5 \ | |
| --warmup_ratio 0.03 \ | |
| --max_steps $MAX_STEPS \ | |
| --per_device_train_batch_size $BS \ | |
| --per_device_eval_batch_size $BS \ | |
| --gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS \ | |
| --gradient_checkpointing true \ | |
| --bf16 \ | |
| --zero_1 false \ | |
| --tensor_parallel_size $TP_DEGREE \ | |
| --pipeline_parallel_size $PP_DEGREE \ | |
| --logging_steps $LOGGING_STEPS \ | |
| --save_total_limit 1 \ | |
| --output_dir $OUTPUT_DIR \ | |
| --lr_scheduler_type "constant" \ | |
| --overwrite_output_dir | |
| ``` |
We need to refactor the tutorial to call neuron_parallel_compile on the training script.
Example can be found here: