Move neuron_parallel_compile outside of bash script

Currently, the tutorial call `neuron_parallel_compile` inside of the bash script. Because `neuron_parallel_compile` is responsible for setting `$NEURON_EXTRACT_GRAPHS_ONLY`, this causes the MAX_STEPS set to -1, causing compilation to run for >1 hour.

```
if [ "$NEURON_EXTRACT_GRAPHS_ONLY" = "1" ]; then
    MAX_STEPS=$((LOGGING_STEPS + 5))
else
    MAX_STEPS=-1
fi
```

https://github.com/huggingface/optimum-neuron/blob/3748a06a73f95f1e4098cbb486cb65d654e029d8/docs/source/training_tutorials/sft_lora_finetune_llm.mdx?plain=1#L215-L262

We need to refactor the tutorial to call `neuron_parallel_compile` on the training script.

Example can be found here:

https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_llama_tp_zero1.html

	```bash
	#!/bin/bash
	set -ex

	export NEURON_FUSE_SOFTMAX=1
	export NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=3
	export MALLOC_ARENA_MAX=64
	export NEURON_CC_FLAGS="--model-type=transformer --distribution-strategy=llm-training --enable-saturate-infinity --cache_dir=/home/ubuntu/cache_dir_neuron/"

	PROCESSES_PER_NODE=8

	NUM_EPOCHS=1
	TP_DEGREE=2
	PP_DEGREE=1
	BS=1
	GRADIENT_ACCUMULATION_STEPS=8
	LOGGING_STEPS=1
	MODEL_NAME="meta-llama/Meta-Llama-3-8B"
	OUTPUT_DIR=output-$SLURM_JOB_ID

	if [ "$NEURON_EXTRACT_GRAPHS_ONLY" = "1" ]; then
	MAX_STEPS=$((LOGGING_STEPS + 5))
	else
	MAX_STEPS=-1
	fi


	XLA_USE_BF16=1 neuron_parallel_compile torchrun --nproc_per_node $PROCESSES_PER_NODE docs/source/training_tutorials/sft_lora_finetune_llm.py \
	--model_id $MODEL_NAME \
	--num_train_epochs $NUM_EPOCHS \
	--do_train \
	--learning_rate 5e-5 \
	--warmup_ratio 0.03 \
	--max_steps $MAX_STEPS \
	--per_device_train_batch_size $BS \
	--per_device_eval_batch_size $BS \
	--gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS \
	--gradient_checkpointing true \
	--bf16 \
	--zero_1 false \
	--tensor_parallel_size $TP_DEGREE \
	--pipeline_parallel_size $PP_DEGREE \
	--logging_steps $LOGGING_STEPS \
	--save_total_limit 1 \
	--output_dir $OUTPUT_DIR \
	--lr_scheduler_type "constant" \
	--overwrite_output_dir
	```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move neuron_parallel_compile outside of bash script #706

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Move neuron_parallel_compile outside of bash script #706

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions