Description
Environment
python 3.11.9
cuda 11.8
torch 2.4.0+cu118
PyTorch information
PyTorch version: 2.4.0+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.30.3
Libc version: glibc-2.31
Python version: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-192-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A100 80GB PCIe
Nvidia driver version: 550.54.14
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.7
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.4
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
NUMA node(s): 1
Versions of relevant libraries:
[pip3] numpy==1.26.3
[pip3] onnx==1.16.2
[pip3] onnxruntime==1.19.0
[pip3] pytorch-ranger==0.1.1
[pip3] torch==2.4.0+cu118
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==2.4.0+cu118
[pip3] torchmetrics==1.4.0.post0
[pip3] torchvision==0.19.0+cu118
[pip3] triton==3.0.0
[conda] numpy 1.26.3 pypi_0 pypi
[conda] pytorch-ranger 0.1.1 pypi_0 pypi
[conda] torch 2.4.0+cu118 pypi_0 pypi
[conda] torch-optimizer 0.3.0 pypi_0 pypi
[conda] torchaudio 2.4.0+cu118 pypi_0 pypi
[conda] torchmetrics 1.4.0.post0 pypi_0 pypi
[conda] torchvision 0.19.0+cu118 pypi_0 pypi
[conda] triton 3.0.0 pypi_0 pypi
Composer information
Composer Version: 0.24.1
Composer Commit Hash: None
CPU Model: AMD EPYC 7542 32-Core Processor
CPU Count: 32
Number of Nodes: 1
GPU Model: NVIDIA A100 80GB PCIe
GPUs per Node: 1
GPU Count: 1
CUDA Device Count: 1
-->
To reproduce
Steps to reproduce the behavior:
1.pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu118 --force-reinstall
2.pip install -e .
3.cd scripts/train
4. composer train.py finetune_example/gpt2-arc-easy--cpu.yaml
It gives the following error when run on cpu : omegaconf.errors.InterpolationKeyError: Interpolation key 'global_seed' not found
5. composer train.py finetune_example/mpt-7b-arc-easy--gpu.yaml
It gives the following error when run on gpu: ValueError: Unused parameters ['global_seed'] found in cfg. Please check your yaml to ensure these parameters are necessary. Please place any variables under the variables
key.
When run on gpu:
Expected behavior
The fine-tuning should work