Skip to content

train with caffee Vitis-AI GPU fail #691

Closed
@mhanuel26

Description

@mhanuel26

Hi,

I am getting the following issue while doing train on cf_refinedet_coco_360_480_0.96_5.08G_2.0

(vitis-ai-caffe) Vitis-AI /workspace/models/AI-Model-Zoo/cf_refinedet_coco_360_480_0.96_5.08G_2.0/code/train > bash train.sh 
../../../caffe-xilinx/build/tools/caffe.bin does not exist, try use path in pre-build docker
F0303 10:14:08.370003   394 gpu_memory.cpp:171] Check failed: error == cudaSuccess (10 vs. 0)  invalid device ordinal
*** Check failure stack trace: ***
    @     0x7ff0e4aaf4dd  google::LogMessage::Fail()
    @     0x7ff0e4ab7071  google::LogMessage::SendToLog()
    @     0x7ff0e4aaeecd  google::LogMessage::Flush()
    @     0x7ff0e4ab076a  google::LogMessageFatal::~LogMessageFatal()
    @     0x7ff0e3760145  caffe::GPUMemory::Manager::update_dev_info()
    @     0x7ff0e37606bf  caffe::GPUMemory::Manager::init()
    @     0x55a72c9920ed  train()
    @     0x55a72c98ba59  main
    @     0x7ff0e1ceac87  __libc_start_main
    @     0x55a72c98c6a8  (unknown)
train.sh: line 37:   394 Aborted                 (core dumped) $exec_path "$@"

Here is the output of nvidia-smi

mhanuel@mhanuel-MSI:~$ nvidia-smi
Thu Mar  3 10:15:15 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
|  0%   36C    P8    24W / 170W |    386MiB / 12288MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      7372      G   /usr/lib/xorg/Xorg                 35MiB |
|    0   N/A  N/A      7851      G   /usr/lib/xorg/Xorg                235MiB |
|    0   N/A  N/A      7976      G   /usr/bin/gnome-shell               40MiB |
|    0   N/A  N/A      8471      G   ...520405909793494209,131072       23MiB |
|    0   N/A  N/A    180023      G   ...AAAAAAAAA= --shared-files       39MiB |
+-----------------------------------------------------------------------------+

What could I be missing?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions