[DOTry to use lora_finetune.py directly and get an error /bin/bash: line 1: export: `=/usr/bin/supervisord': not a valid identifier

### 📚 The doc issue

git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI

# install dependency
pip install -r requirements/requirements.txt

# install colossalai
BUILD_EXT=1 pip install .
export CUDA_VISIBLE_DEVICES=0
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
 export CUDA_INSTALL_DIR=/usr/local/cuda-12.4/
 export CUDA_HOME=/usr/local/cuda-12.4/
# colossalai check -i
#### Installation Report ####

------------ Environment ------------
Colossal-AI version: 0.4.8
PyTorch version: 2.5.1
System CUDA version: 12.4
CUDA version required by PyTorch: 12.4

Note:
1. The table above checks the versions of the libraries/tools in the current environment
2. If the System CUDA version is N/A, you can set the CUDA_HOME environment variable to locate it
3. If the CUDA version required by PyTorch is N/A, you probably did not install a CUDA-compatible PyTorch. This value is give by torch.version.cuda and you can go to https://pytorch.org/get-started/locally/ to download the correct version.

------------ CUDA Extensions AOT Compilation ------------
Found AOT CUDA Extension: ✓
PyTorch version used for AOT compilation: N/A
CUDA version used for AOT compilation: N/A

Note:
1. AOT (ahead-of-time) compilation of the CUDA kernels occurs during installation when the environment variable BUILD_EXT=1 is set
2. If AOT compilation is not enabled, stay calm as the CUDA kernels can still be built during runtime

------------ Compatibility ------------
PyTorch version match: N/A
System and PyTorch CUDA version match: ✓
System and Colossal-AI CUDA version match: N/A

Note:
1. The table above checks the version compatibility of the libraries/tools in the current environment
   - PyTorch version mismatch: whether the PyTorch version in the current environment is compatible with the PyTorch version used for AOT compilation
   - System and PyTorch CUDA version match: whether the CUDA version in the current environment is compatible with the CUDA version required by PyTorch
   - System and Colossal-AI CUDA version match: whether the CUDA version in the current environment is compatible with the CUDA version used for AOT compilation
   

#colossalai run   --nproc_per_node 1   lora_finetune.py   --pretrained "/root/autodl-tmp/DeepSeeK-R1-7B"   --dataset "/root/converted_data.json"   --quant 4   --lora_rank 32   --lora_alpha 64   --batch_size 8   --gradient_accumulation 2   --max_length 1024   --lr 1.5e-4   --warmup_steps 50   --num_epochs 3   --save_dir "/root/autodl-tmp/DeepSeeK_lora"   --grad_ckpt   --dtype bf16 

/bin/bash: line 1: export: `=/usr/bin/supervisord': not a valid identifier
Error: failed to run torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=29500 lora_finetune.py --pretrained /root/autodl-tmp/DeepSeeK-R1-7B --dataset /root/converted_data.json --quant 4 --lora_rank 32 --lora_alpha 64 --batch_size 8 --gradient_accumulation 2 --max_length 1024 --lr 1.5e-4 --warmup_steps 50 --num_epochs 3 --save_dir /root/autodl-tmp/DeepSeeK_lora --grad_ckpt --dtype bf16 on 127.0.0.1, is localhost: True, exception: Encountered a bad command exit code!

Command: 'cd /root/ColossalAI/applications/ColossalChat/examples/training_scripts && export    ="/usr/bin/supervisord" SHELL="/bin/bash" NV_LIBCUBLAS_VERSION="12.4.5.8-1" NVIDIA_VISIBLE_DEVICES="GPU-866ac0d7-8995-0dd3-9bc5-6de16452ad15" NV_NVML_DEV_VERSION="12.4.127-1" NV_CUDNN_PACKAGE_NAME="libcudnn9-cuda-12" NV_LIBNCCL_DEV_PACKAGE="libnccl-dev=2.21.5-1+cuda12.4" CONDA_EXE="/root/miniconda3/bin/conda" NV_LIBNCCL_DEV_PACKAGE_VERSION="2.21.5-1" HOSTNAME="autodl-container-493b4c87d3-99a9c3d7" NVIDIA_REQUIRE_CUDA="cuda>=12.4 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=525,driver<526 brand=unknown,driver>=525,driver<526 brand=nvidia,driver>=525,driver<526 brand=nvidiartx,driver>=525,driver<526 brand=geforce,driver>=525,driver<526 brand=geforcertx,driver>=525,driver<526 brand=quadro,driver>=525,driver<526 brand=quadrortx,driver>=525,driver<526 brand=titan,driver>=525,driver<526 brand=titanrtx,driver>=525,driver<526 brand=tesla,driver>=535,driver<536 brand=unknown,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=geforce,driver>=535,driver<536 brand=geforcertx,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=titan,driver>=535,driver<536 brand=titanrtx,driver>=535,driver<536" NV_LIBCUBLAS_DEV_PACKAGE="libcublas-dev-12-4=12.4.5.8-1" NV_NVTX_VERSION="12.4.127-1" NV_CUDA_CUDART_DEV_VERSION="12.4.127-1" NV_LIBCUSPARSE_VERSION="12.3.1.170-1" NV_LIBNPP_VERSION="12.2.5.30-1" NCCL_VERSION="2.21.5-1" PWD="/root/ColossalAI/applications/ColossalChat/examples/training_scripts" AutoDLContainerUUID="493b4c87d3-99a9c3d7" CONDA_PREFIX="/root/miniconda3/envs/sft" NV_CUDNN_PACKAGE="libcudnn9-cuda-12=9.1.0.70-1" NVIDIA_DRIVER_CAPABILITIES="compute,utility,graphics,video" JUPYTER_SERVER_URL="http://autodl-container-493b4c87d3-99a9c3d7:8888/jupyter/" NV_NVPROF_DEV_PACKAGE="cuda-nvprof-12-4=12.4.127-1" NV_LIBNPP_PACKAGE="libnpp-12-4=12.2.5.30-1" NV_LIBNCCL_DEV_PACKAGE_NAME="libnccl-dev" TZ="Asia/Shanghai" NV_LIBCUBLAS_DEV_VERSION="12.4.5.8-1" NVIDIA_PRODUCT_NAME="CUDA" NV_LIBCUBLAS_DEV_PACKAGE_NAME="libcublas-dev-12-4" LINES="45" NV_CUDA_CUDART_VERSION="12.4.127-1" AutoDLServiceURL="https://u502097-87d3-99a9c3d7.nmb1.seetacloud.com:8443" HOME="/root" LS_COLORS="rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:" COLUMNS="176" AutoDLRegion="nm-B1" CUDA_VERSION="12.4.1" AgentHost="172.29.52.64" NV_LIBCUBLAS_PACKAGE="libcublas-12-4=12.4.5.8-1" NV_CUDA_NSIGHT_COMPUTE_DEV_PACKAGE="cuda-nsight-compute-12-4=12.4.1-1" CONDA_PROMPT_MODIFIER="(sft) " NV_LIBNPP_DEV_PACKAGE="libnpp-dev-12-4=12.2.5.30-1" NV_LIBCUBLAS_PACKAGE_NAME="libcublas-12-4" NV_LIBNPP_DEV_VERSION="12.2.5.30-1" JUPYTER_SERVER_ROOT="/root" TERM="xterm-256color" NV_LIBCUSPARSE_DEV_VERSION="12.3.1.170-1" LIBRARY_PATH="/usr/local/cuda/lib64/stubs" NV_CUDNN_VERSION="9.1.0.70-1" AutodlAutoPanelToken="jupyter-autodl-container-493b4c87d3-99a9c3d7-1f3f70c858d6c46d3975675baf8f3e103263f16190d504cfa848ca726f9077e18" CONDA_SHLVL="2" SHLVL="2" PYXTERM_DIMENSIONS="80x25" CUDA_INSTALL_DIR="/usr/local/cuda-12.4/" NV_CUDA_LIB_VERSION="12.4.1-1" NVARCH="x86_64" NV_CUDNN_PACKAGE_DEV="libcudnn9-dev-cuda-12=9.1.0.70-1" NV_CUDA_COMPAT_PACKAGE="cuda-compat-12-4" CONDA_PYTHON_EXE="/root/miniconda3/bin/python" NV_LIBNCCL_PACKAGE="libnccl2=2.21.5-1+cuda12.4" LD_LIBRARY_PATH="/usr/local/nvidia/lib:/usr/local/nvidia/lib64" LC_CTYPE="C.UTF-8" CONDA_DEFAULT_ENV="sft" NV_CUDA_NSIGHT_COMPUTE_VERSION="12.4.1-1" REQUESTS_CA_BUNDLE="/etc/ssl/certs/ca-certificates.crt" OMP_NUM_THREADS="16" NV_NVPROF_VERSION="12.4.127-1" CUDA_HOME="/usr/local/cuda-12.4/" PATH="/root/miniconda3/envs/sft/bin:/root/miniconda3/condabin:/root/miniconda3/bin:/usr/local/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" NV_LIBNCCL_PACKAGE_NAME="libnccl2" NV_LIBNCCL_PACKAGE_VERSION="2.21.5-1" MKL_NUM_THREADS="16" CONDA_PREFIX_1="/root/miniconda3" DEBIAN_FRONTEND="noninteractive" OLDPWD="/root/ColossalAI" AutoDLDataCenter="neimengDC3" _="/root/miniconda3/envs/sft/bin/colossalai" CUDA_DEVICE_MAX_CONNECTIONS="1" && torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=29500 lora_finetune.py --pretrained /root/autodl-tmp/DeepSeeK-R1-7B --dataset /root/converted_data.json --quant 4 --lora_rank 32 --lora_alpha 64 --batch_size 8 --gradient_accumulation 2 --max_length 1024 --lr 1.5e-4 --warmup_steps 50 --num_epochs 3 --save_dir /root/autodl-tmp/DeepSeeK_lora --grad_ckpt --dtype bf16'



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOTry to use lora_finetune.py directly and get an error /bin/bash: line 1: export: `=/usr/bin/supervisord': not a valid identifier #6235

📚 The doc issue

install dependency

install colossalai

colossalai check -i

Installation Report

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[DOTry to use lora_finetune.py directly and get an error /bin/bash: line 1: export: `=/usr/bin/supervisord': not a valid identifier #6235

Description

📚 The doc issue

install dependency

install colossalai

colossalai check -i

Installation Report

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions