Skip to content

Fine-tuning error in conda environment without docker image #1538

Open
@LalchandPandia

Description

@LalchandPandia

Environment

python 3.11.9
cuda 11.8
torch 2.4.0+cu118

PyTorch information

PyTorch version: 2.4.0+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.30.3
Libc version: glibc-2.31

Python version: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-192-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A100 80GB PCIe
Nvidia driver version: 550.54.14
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.7
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.4
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
NUMA node(s): 1
Versions of relevant libraries:
[pip3] numpy==1.26.3
[pip3] onnx==1.16.2
[pip3] onnxruntime==1.19.0
[pip3] pytorch-ranger==0.1.1
[pip3] torch==2.4.0+cu118
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==2.4.0+cu118
[pip3] torchmetrics==1.4.0.post0
[pip3] torchvision==0.19.0+cu118
[pip3] triton==3.0.0
[conda] numpy 1.26.3 pypi_0 pypi
[conda] pytorch-ranger 0.1.1 pypi_0 pypi
[conda] torch 2.4.0+cu118 pypi_0 pypi
[conda] torch-optimizer 0.3.0 pypi_0 pypi
[conda] torchaudio 2.4.0+cu118 pypi_0 pypi
[conda] torchmetrics 1.4.0.post0 pypi_0 pypi
[conda] torchvision 0.19.0+cu118 pypi_0 pypi
[conda] triton 3.0.0 pypi_0 pypi

Composer information

Composer Version: 0.24.1
Composer Commit Hash: None
CPU Model: AMD EPYC 7542 32-Core Processor
CPU Count: 32
Number of Nodes: 1
GPU Model: NVIDIA A100 80GB PCIe
GPUs per Node: 1
GPU Count: 1
CUDA Device Count: 1

-->

To reproduce

Steps to reproduce the behavior:

1.pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu118 --force-reinstall
2.pip install -e .
3.cd scripts/train
4. composer train.py finetune_example/gpt2-arc-easy--cpu.yaml
It gives the following error when run on cpu : omegaconf.errors.InterpolationKeyError: Interpolation key 'global_seed' not found
5. composer train.py finetune_example/mpt-7b-arc-easy--gpu.yaml
It gives the following error when run on gpu: ValueError: Unused parameters ['global_seed'] found in cfg. Please check your yaml to ensure these parameters are necessary. Please place any variables under the variables key.
When run on gpu:

Expected behavior

The fine-tuning should work

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions