Open
Description
I have an EC2 instance of type g5g.xlarge. I have installed the following:
CUDA-Toolit: Cuda compilation tools, release 12.4, V12.4.131
CUDNN Version: 9.6.0
Python: 3.12
Pytorch: Compiled from source as for aarch64 v2.5 is not available.
Onnxruntime: Compiled from source as the distrubution package is not available for the architecture
Architecture: aarch64
OS: Amazon Linux 2023
On the following code:
def to_numpy(tensor):
return tensor.detach().gpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()
# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(input_batch)}
ort_outs = ort_session.run(None, ort_inputs)
I am getting the following Error:
EP Error: [ONNXRuntimeError] : 11 : EP_FAIL : Non-zero status code returned while running Conv node. Name:'/features/features.0/Conv' Status Message: Failed to initialize CUDNN Frontend/home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:99 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:91 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDNN_FE failure 11: CUDNN_BACKEND_API_FAILED ; GPU=0 ; hostname=sg-gpu-1 ; file=/home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/nn/conv.cc ; line=224 ; expr=s_.cudnn_fe_graph->build_operation_graph(handle);
with the cudnn frontend json:
{"context":{"compute_data_type":"FLOAT","intermediate_data_type":"FLOAT","io_data_type":"FLOAT","name":"","sm_count":-1},"cudnn_backend_version":"9.6.0","cudnn_frontend_version":10700,"json_version":"1.0","nodes":[{"compute_data_type":"FLOAT","dilation":[1,1],"inputs":{"W":"w","X":"x"},"math_mode":"CROSS_CORRELATION","name":"","outputs":{"Y":"::Y"},"post_padding":[2,2],"pre_padding":[2,2],"stride":[4,4],"tag":"CONV_FPROP"}],"tensors":{"::Y":{"data_type":"FLOAT","dim":[1,64,55,55],"is_pass_by_value":false,"is_virtual":false,"name":"::Y","pass_by_value":null,"reordering_type":"NONE","stride":[193600,3025,55,1],"uid":0,"uid_assigned":false},"w":{"data_type":"FLOAT","dim":[64,3,11,11],"is_pass_by_value":false,"is_virtual":false,"name":"w","pass_by_value":null,"reordering_type":"NONE","stride":[363,121,11,1],"uid":1,"uid_assigned":true},"x":{"data_type":"FLOAT","dim":[1,3,224,224],"is_pass_by_value":false,"is_virtual":false,"name":"x","pass_by_value":null,"reordering_type":"NONE","stride":[150528,50176,224,1],"uid":0,"uid_assigned":false}}} using ['CUDAExecutionProvider', 'CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
2025-01-08 12:06:10.797719929 [E:onnxruntime:Default, cudnn_fe_call.cc:33 CudaErrString<cudnn_frontend::error_object>] CUDNN_BACKEND_TENSOR_DESCRIPTOR cudnnFinalize failed cudnn_status: CUDNN_STATUS_SUBLIBRARY_LOADING_FAILED
2025-01-08 12:06:10.797924540 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/features/features.0/Conv' Status Message: Failed to initialize CUDNN Frontend/home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:99 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:91 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDNN_FE failure 11: CUDNN_BACKEND_API_FAILED ; GPU=0 ; hostname=sg-gpu-1 ; file=/home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/nn/conv.cc ; line=224 ; expr=s_.cudnn_fe_graph->build_operation_graph(handle);
with the cudnn frontend json:
{"context":{"compute_data_type":"FLOAT","intermediate_data_type":"FLOAT","io_data_type":"FLOAT","name":"","sm_count":-1},"cudnn_backend_version":"9.6.0","cudnn_frontend_version":10700,"json_version":"1.0","nodes":[{"compute_data_type":"FLOAT","dilation":[1,1],"inputs":{"W":"w","X":"x"},"math_mode":"CROSS_CORRELATION","name":"","outputs":{"Y":"::Y"},"post_padding":[2,2],"pre_padding":[2,2],"stride":[4,4],"tag":"CONV_FPROP"}],"tensors":{"::Y":{"data_type":"FLOAT","dim":[1,64,55,55],"is_pass_by_value":false,"is_virtual":false,"name":"::Y","pass_by_value":null,"reordering_type":"NONE","stride":[193600,3025,55,1],"uid":0,"uid_assigned":false},"w":{"data_type":"FLOAT","dim":[64,3,11,11],"is_pass_by_value":false,"is_virtual":false,"name":"w","pass_by_value":null,"reordering_type":"NONE","stride":[363,121,11,1],"uid":1,"uid_assigned":true},"x":{"data_type":"FLOAT","dim":[1,3,224,224],"is_pass_by_value":false,"is_virtual":false,"name":"x","pass_by_value":null,"reordering_type":"NONE","stride":[150528,50176,224,1],"uid":0,"uid_assigned":false}}}
However, prints from the below code confirms that the installation is done perfectly:
print("Pytorch CUDA:", torch.cuda.is_available())
print("Available Providers:", onnxruntime.get_available_providers())
print("Active Providers for this session:", ort_session.get_providers())
Output:
Pytorch CUDA: True
Available Providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
Active Providers for this session: ['CUDAExecutionProvider', 'CPUExecutionProvider']
In order to resolve this, I have installed the nvidia_cudnn_frontend v1.9.0 from the source. Still it is not resolved.
nvidia-smi is working. Its version is: NVIDIA-SMI 550.127.08
nvcc is also working fine.
nvidia-cudnn-frontend==1.9.0
nvtx==0.2.10
onnx==1.17.0
onnxruntime-gpu==1.20.1
optree==0.13.1
torch==2.5.0a0+gita8d6afb
torchaudio==2.5.1
torchvision==0.20.1
Versions
Collecting environment information...
PyTorch version: 2.5.0a0+gita8d6afb
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A
OS: Amazon Linux 2023.6.20241212 (aarch64)
GCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)
Clang version: Could not collect
CMake version: version 3.31.2
Libc version: glibc-2.34
Python version: 3.12.0 (main, Jan 5 2025, 18:22:01) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] (64-bit runtime)
Python platform: Linux-6.1.119-129.201.amzn2023.aarch64-aarch64-with-glibc2.34
Is CUDA available: True
CUDA runtime version: 12.4.131
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA T4G
Nvidia driver version: 550.127.08
cuDNN version: Probably one of the following:
/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn.so.9
/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_adv.so.9
/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_cnn.so.9
/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_engines_precompiled.so.9
/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_engines_runtime_compiled.so.9
/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_graph.so.9
/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_heuristic.so.9
/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_ops.so.9
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: ARM
Model name: Neoverse-N1
Model: 1
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: r3p1
BogoMIPS: 243.75
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
L1d cache: 256 KiB (4 instances)
L1i cache: 256 KiB (4 instances)
L2 cache: 4 MiB (4 instances)
L3 cache: 32 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; CSV2, BHB
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.3
[pip3] nvidia-cudnn-frontend==1.9.0
[pip3] nvtx==0.2.10
[pip3] onnx==1.17.0
[pip3] onnxruntime-gpu==1.20.1
[pip3] optree==0.13.1
[pip3] torch==2.5.0a0+gita8d6afb
[pip3] torchaudio==2.5.1
[pip3] torchvision==0.20.1
[conda] Could not collect