Skip to content

RuntimeError: Unable to proceed, no GPU resources available #33

@louxingrui

Description

@louxingrui

当我使用bash scripts/full_model/finetune_cpm2_math.sh后,显示RuntimeError: Unable to proceed, no GPU resources available,我的显卡是rtx2080Ti,安装了cuda10.2,在docker环境外跑程序是没有问题的,请问是因为cuda版本和docker环境内的版本不一致的问题吗?这是终端中一些错误的主要信息:
[2022-01-31 14:07:27,900] [WARNING] [runner.py:117:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:100.) return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last): File "/opt/conda/bin/deepspeed", line 6, in <module> main() File "/opt/conda/lib/python3.8/site-packages/deepspeed/launcher/runner.py", line 264, in main raise RuntimeError("Unable to proceed, no GPU resources available")
RuntimeError: Unable to proceed, no GPU resources available
希望能得到您的答复!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions