Open
Description
Check duplicate issues.
- Checked for duplicates
Description
The test TMVA-DNN-LSTM-BackpropagationCudnn
crashes on ubuntu2404 cuda-12.6.1 with cudnn with the following stack trace:
0x00007fda7f0b5540 in <unknown> from /usr/lib64/libcuda.so.1
0x00007fda7ed1491e in <unknown> from /usr/lib64/libcuda.so.1
0x00007fda7f08f040 in <unknown> from /usr/lib64/libcuda.so.1
0x00007fda7ed0ef22 in <unknown> from /usr/lib64/libcuda.so.1
0x00007fda7eed2bae in <unknown> from /usr/lib64/libcuda.so.1
0x00007fdaaa248b01 in <unknown> from /usr/local/cuda-12.6/targets/x86_64-linux/lib/libcudart.so.12
0x00007fdaaa218baa in <unknown> from /usr/local/cuda-12.6/targets/x86_64-linux/lib/libcudart.so.12
0x00007fdaaa270721 in cudaMemcpy + 0x211 from /usr/local/cuda-12.6/targets/x86_64-linux/lib/libcudart.so.12
0x000055d25af29e37 in bool testLSTMBackpropagation<TMVA::DNN::TCudnn<double> >(unsigned long, unsigned long, unsigned long, unsigned long, TMVA::DNN::TCudnn<double>::Scalar_t, std::vector<bool, std::allocator<bool> >, bool) + 0x4d37 from /github/home/ROOT-CI/build/tmva/tmva/test/DNN/LSTM/testLSTMBackpropagationCudnn
Specifically, it's the assignment in this loop:
root/tmva/tmva/test/DNN/LSTM/TestLSTMBackpropagation.h
Lines 149 to 159 in 9d876cd
Which triggers a cuda_memcpy to the GPU. The crash happens somewhere in the cuda library. Other cudnn tests work, so the problem is not necessarily a broken installation.
Reproducer
cmake -Dtmva-gpu=On -Dtesting=On <src>
ctest -R TMVA-DNN-LSTM-BackpropagationCudnn
ROOT version
Master
Installation method
Source
Operating system
ubuntu24 docker container with cuda 12.6.1
Additional context
No response