Skip to content

[BUG] MAR cannot find libcuda.so #576

@bluna301

Description

@bluna301

Describe the bug
Running into an error when trying to test a MAP locally using the MAR related to libcuda.so:

~/$ monai-deploy run -i $HOLOSCAN_INPUT_PATH -o $HOLOSCAN_OUTPUT_PATH ghcr.io/cchmc-dll/ped-abd-ct-seg-map/cchmc-ct-liver-spleen-seg-x64-workstation-dgpu-linux-amd64:0.0.6

[2026-01-23 16:20:47,358] [INFO] (runner) - Checking dependencies...
[2026-01-23 16:20:47,358] [INFO] (runner) - --> Verifying if "docker" is installed...

[2026-01-23 16:20:47,358] [INFO] (runner) - --> Verifying if "docker-buildx" is installed...

[2026-01-23 16:20:47,358] [INFO] (runner) - --> Verifying if "ghcr.io/cchmc-dll/ped-abd-ct-seg-map/cchmc-ct-liver-spleen-seg-x64-workstation-dgpu-linux-amd64:0.0.6" is available...

[2026-01-23 16:20:47,400] [INFO] (runner) - Reading HAP/MAP manifest...
Successfully copied 2.56kB to /tmp/tmpm5qxz_ju/app.json
Successfully copied 2.05kB to /tmp/tmpm5qxz_ju/pkg.json
9d189f78e1fc5f7d0875bc9e580aa072fc7563ee366842203209e1d846b5fb96
[2026-01-23 16:20:47,523] [INFO] (runner) - --> Verifying if "nvidia-ctk" is installed...

[2026-01-23 16:20:47,523] [INFO] (runner) - --> Verifying "nvidia-ctk" version...

[2026-01-23 16:20:47,861] [INFO] (common) - Launching container (0f93dc60eb7d) using image 'ghcr.io/cchmc-dll/ped-abd-ct-seg-map/cchmc-ct-liver-spleen-seg-x64-workstation-dgpu-linux-amd64:0.0.6'...
    container name:      hopeful_ritchie
    host name:           docker-desktop
    network:             host
    user:                1000:1000
    ulimits:             memlock=-1:-1, stack=67108864:67108864
    cap_add:             CAP_SYS_PTRACE
    ipc mode:            host
    shared memory size:  67108864
    devices:             
    group_add:           44
Files in /var/holoscan/input

2026-01-23 21:20:48 [INFO] Launching application python3 /opt/holoscan/app ...

/var/holoscan/input:

total 105280

drwxr-xr-x 2 holoscan holoscan  12288 Sep 22 22:26 .

drwxr-xr-x 1 holoscan root       4096 Nov  3 20:46 ..

-rw-r--r-- 1 holoscan holoscan 527892 Jul 17  2024 1-001.dcm

-rw-r--r-- 1 holoscan holoscan 527886 Jul 17  2024 1-204.dcm

/home/holoscan/.local/lib/python3.10/site-packages/monai/utils/module.py:396: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.

  pkg = __import__(module)  # top level module

[info] [fragment.cpp:705] Loading extensions from configs...

[info] [gxf_executor.cpp:265] Creating context

[2026-01-23 21:20:50,773] [INFO] (root) - Parsed args: Namespace(log_level=None, input=None, output=None, model=None, workdir=None, triton_server_netloc=None, argv=['/opt/holoscan/app'])

[2026-01-23 21:20:50,775] [INFO] (root) - AppContext object: AppContext(input_path=/var/holoscan/input, output_path=/var/holoscan/output, model_path=/opt/holoscan/models, workdir=/var/holoscan), triton_server_netloc=

[2026-01-23 21:20:50,777] [INFO] (root) - End compose

[info] [gxf_executor.cpp:2395] Activating Graph...

[info] [gxf_executor.cpp:2425] Running Graph...

[info] [gxf_executor.cpp:2427] Waiting for completion...

[info] [greedy_scheduler.cpp:191] Scheduling 7 entities

[2026-01-23 21:20:50,868] [INFO] (monai.deploy.operators.dicom_data_loader_operator.DICOMDataLoaderOperator) - No or invalid input path from the optional input port: None

[2026-01-23 21:20:51,133] [INFO] (root) - Finding series for Selection named: Standard Axial CT Series

[2026-01-23 21:20:51,134] [INFO] (root) - Searching study, : 1.3.6.1.4.1.14519.5.2.1.7085.2626.822645453932810382886582736291

  # of series: 1

[2026-01-23 21:20:51,134] [INFO] (root) - Working on series, instance UID: 1.3.6.1.4.1.14519.5.2.1.7085.2626.119403521930927333027265674239

[2026-01-23 21:20:51,134] [INFO] (root) -     On attribute: 'StudyDescription' to match value: '(.*?)'

[2026-01-23 21:20:51,134] [INFO] (root) -         Series attribute StudyDescription value: CT ABDOMEN W IV CONTRAST

[2026-01-23 21:20:51,134] [INFO] (root) -     On attribute: 'Modality' to match value: '(?i)CT'

[2026-01-23 21:20:51,134] [INFO] (root) -         Series attribute Modality value: CT

[2026-01-23 21:20:51,134] [INFO] (root) -     On attribute: 'ImageOrientationPatient' to match value: 'Axial'

[2026-01-23 21:20:51,134] [INFO] (root) -         Series attribute ImageOrientationPatient value: None

[2026-01-23 21:20:51,134] [INFO] (root) -         Instance level attribute ImageOrientationPatient value: ['[1, 0, 0, 0, 1, 0]']

[2026-01-23 21:20:51,134] [INFO] (root) -         Computed orientation from ImageOrientationPatient value: Axial

[2026-01-23 21:20:51,134] [INFO] (root) -     On attribute: 'ImageType' to match value: ['PRIMARY']

[2026-01-23 21:20:51,134] [INFO] (root) -         Series attribute ImageType value: None

[2026-01-23 21:20:51,135] [INFO] (root) -         Instance level attribute ImageType value: ["['ORIGINAL', 'PRIMARY', 'AXIAL', 'CT_SOM5 SPI']"]

[2026-01-23 21:20:51,135] [INFO] (root) -     On attribute: 'SliceThickness' to match value: [2, 5]

[2026-01-23 21:20:51,135] [INFO] (root) -         Series attribute SliceThickness value: None

[2026-01-23 21:20:51,135] [INFO] (root) -         Instance level attribute SliceThickness value: 3

[2026-01-23 21:20:51,135] [INFO] (root) -     On attribute: 'SeriesDescription' to match value: '(?i)^(?!.*(cor|sag)).*$'

[2026-01-23 21:20:51,135] [INFO] (root) -         Series attribute SeriesDescription value: ABD/PANC 3.0 B31f

[2026-01-23 21:20:51,135] [INFO] (root) - Selected Series, UID: 1.3.6.1.4.1.14519.5.2.1.7085.2626.119403521930927333027265674239

[2026-01-23 21:20:51,135] [INFO] (root) - Series Selection finalized

[2026-01-23 21:20:51,135] [INFO] (root) - Series Description of selected DICOM Series for inference: ABD/PANC 3.0 B31f

[2026-01-23 21:20:51,135] [INFO] (root) - Series Instance UID of selected DICOM Series for inference: 1.3.6.1.4.1.14519.5.2.1.7085.2626.119403521930927333027265674239

monai.transforms.croppad.dictionary CropForegroundd.__init__:allow_smaller: Current default value of argument `allow_smaller=True` has been deprecated since version 1.2. It will be changed to `allow_smaller=False` in version 1.5.

[2026-01-23 21:20:51,389] [INFO] (abdomen_seg_operator.AbdomenSegOperator) - TorchScript model detected

[2026-01-23 21:20:51,389] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - Converted Image object metadata:

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.7085.2626.119403521930927333027265674239, type <class 'str'>

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - SeriesDate: 20090831, type <class 'str'>

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - SeriesTime: 101721.452, type <class 'str'>

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - Modality: CT, type <class 'str'>

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - SeriesDescription: ABD/PANC 3.0 B31f, type <class 'str'>

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - PatientPosition: HFS, type <class 'str'>

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - SeriesNumber: 8, type <class 'int'>

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - row_pixel_spacing: 0.7890625, type <class 'float'>

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - col_pixel_spacing: 0.7890625, type <class 'float'>

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - depth_pixel_spacing: 1.5, type <class 'float'>

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - row_direction_cosine: [1.0, 0.0, 0.0], type <class 'list'>

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - col_direction_cosine: [0.0, 1.0, 0.0], type <class 'list'>

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - depth_direction_cosine: [0.0, 0.0, 1.0], type <class 'list'>

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - dicom_affine_transform: [[   0.7890625    0.           0.        -197.60547  ]

 [   0.           0.7890625    0.        -398.60547  ]

 [   0.           0.           1.5       -383.       ]

 [   0.           0.           0.           1.       ]], type <class 'numpy.ndarray'>

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - nifti_affine_transform: [[  -0.7890625   -0.          -0.         197.60547  ]

 [  -0.          -0.7890625   -0.         398.60547  ]

 [   0.           0.           1.5       -383.       ]

 [   0.           0.           0.           1.       ]], type <class 'numpy.ndarray'>

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - StudyInstanceUID: 1.3.6.1.4.1.14519.5.2.1.7085.2626.822645453932810382886582736291, type <class 'str'>

[2026-01-23 21:20:51,390] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - StudyID: , type <class 'str'>

[2026-01-23 21:20:51,391] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - StudyDate: 20090831, type <class 'str'>

[2026-01-23 21:20:51,391] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - StudyTime: 095948.599, type <class 'str'>

[2026-01-23 21:20:51,391] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - StudyDescription: CT ABDOMEN W IV CONTRAST, type <class 'str'>

[2026-01-23 21:20:51,391] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - AccessionNumber: 5471978513296937, type <class 'str'>

[2026-01-23 21:20:51,391] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - selection_name: Standard Axial CT Series, type <class 'str'>

[2026-01-23 21:20:53,040] [INFO] (monai.deploy.operators.monai_seg_inference_operator.MonaiSegInferenceOperator) - Input of <class 'monai.data.meta_tensor.MetaTensor'> shape: torch.Size([1, 1, 268, 224, 102])

Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory

/var/holoscan/tools: line 383:    54 Aborted                 python3 /opt/holoscan/app

2026-01-23 21:20:53 [INFO] Application exited with 134.

[2026-01-23 16:20:53,947] [INFO] (common) - Container 'hopeful_ritchie'(0f93dc60eb7d) exited with code 0.

This error was seen when testing a newly created MAP, as well as a previously created MAP (OCT25). The old MAP had no issues with the MAR when testing back in OCT25. The only difference in my environment from then to now is Docker Desktop updates.

I have tested the MAP successfully without the MAR using Ming's workaround as well as with this similar script:

# execute MAP locally with docker run

# check if the correct number of arguments are provided
if [ "$#" -ne 2 ]; then
    echo "Please provide all arguments. Usage: $0 <tag_prefix> <image_version>"
    exit 1
fi

# assign command-line arguments to variables
tag_prefix=$1
image_version=$2

# load in environment variables
source .env

# remove the output directory
rm -rf "$HOLOSCAN_OUTPUT_PATH"

# pre-make directories to smooth permission errors
mkdir -p "$HOLOSCAN_OUTPUT_PATH/temp"
chmod -R u+rwX "$HOLOSCAN_OUTPUT_PATH"

# execute MAP locally via docker run
docker run --rm --gpus all \
  -v "$HOLOSCAN_INPUT_PATH":/var/holoscan/input:ro \
  -v "$HOLOSCAN_OUTPUT_PATH":/var/holoscan/output \
  -v /usr/lib/wsl/lib:/usr/lib/wsl/lib:ro \
  -e LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH \
  --ipc=host \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  ${tag_prefix}-x64-workstation-dgpu-linux-amd64:${image_version}

Seemingly, when running with the MAR, the MAP can't resolve libcuda.so (see below test). A similar issue has been encountered in WSL2; this fixes the error when running pythonically, but not with the MAR. nvidia-smi gives the expected output.

$ docker run --rm -it --gpus all \
  --entrypoint bash \
  cchmc-ct-liver-spleen-seg-x64-workstation-dgpu-linux-amd64:1.7.0 \
  -lc '
set -e
echo "== devices =="; ls -l /dev/dxg 2>/dev/null || true; ls -l /dev/nvidia* 2>/dev/null || true
echo "== ldconfig libcuda =="; ldconfig -p | grep -i "libcuda.so" || true
echo "== find libcuda =="; (find / -name "libcuda.so*" -o -name "libnvidia-ml.so*" 2>/dev/null | head -n 50) || true
echo "== dlopen test ==";
python3 - <<PY
import ctypes
for name in ["libcuda.so", "libcuda.so.1", "libnvidia-ml.so.1"]:
    try:
        ctypes.CDLL(name)
        print("OK:", name)
    except OSError as e:
        print("FAIL:", name, "->", e)
PY
'
== devices ==
crw-rw-rw- 1 holoscan holoscan 10, 127 Jan 23 21:48 /dev/dxg
== ldconfig libcuda ==
        libcuda.so.1 (libc6,x86-64) => /usr/lib/wsl/drivers/nvddsi.inf_amd64_f29ce989a4be25b9/libcuda.so.1
== find libcuda ==
/usr/lib/wsl/drivers/nvddsi.inf_amd64_f29ce989a4be25b9/libnvidia-ml.so.1
/usr/lib/wsl/drivers/nvddsi.inf_amd64_f29ce989a4be25b9/libcuda.so.1.1
/usr/lib/wsl/drivers/nvddsi.inf_amd64_f29ce989a4be25b9/libcuda.so.1
/usr/local/cuda-12.6/targets/x86_64-linux/lib/stubs/libcuda.so
/usr/local/cuda-12.6/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
/usr/local/cuda-12.6/compat/lib.real/libcuda.so.560.35.03
/usr/local/cuda-12.6/compat/lib.real/libcuda.so
/usr/local/cuda-12.6/compat/lib.real/libcuda.so.1
== dlopen test ==
FAIL: libcuda.so -> libcuda.so: cannot open shared object file: No such file or directory
OK: libcuda.so.1
OK: libnvidia-ml.so.1

Steps/Code to reproduce bug
Execute MAP with MAR in WSL2 - Ubuntu - Docker Desktop environment.

Expected behavior
MAR to execute MAP without any fatal errors.

Environment details (please complete the following information)

  • OS/Platform: Windows 11: WSL2 - Ubuntu 22.04 - Docker Desktop v4.57.0
  • Python Version: 3.10.18
  • Method of MONAI Deploy App SDK install: conda
  • SDK Version: 3.1.0 (tested with 3.5.0 and saw the same behavior)
  • Holoscan Version: 3.20

Additional context

$ nvidia-smi

NVIDIA-SMI 575.64.04              Driver Version: 577.00         CUDA Version: 12.9 

Metadata

Metadata

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions