Skip to content

fix: add /usr/lib to library search path in init container#693

Closed
nmn3m wants to merge 2 commits intokubernetes-sigs:mainfrom
nmn3m:fix/init-container-lib-search-path
Closed

fix: add /usr/lib to library search path in init container#693
nmn3m wants to merge 2 commits intokubernetes-sigs:mainfrom
nmn3m:fix/init-container-lib-search-path

Conversation

@nmn3m
Copy link
Copy Markdown

@nmn3m nmn3m commented Oct 18, 2025

Description

This PR fixes an issue where the kubelet-plugin init container fails to detect NVIDIA libraries installed in /usr/lib, causing the pod to remain stuck in Init:0/1 status indefinitely.

Fixes #692

After the fix:

$ kubectl get pods -n nvidia-dra-driver-gpu
NAME                                               READY   STATUS    RESTARTS   AGE
nvidia-dra-driver-gpu-controller-b65d7c4d9-rbr7t   1/1     Running   0          21m
nvidia-dra-driver-gpu-kubelet-plugin-vbjgc         2/2     Running   0          17s

Init container successfully completes:

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Oct 18, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

nmn3m added 2 commits October 20, 2025 22:06
Signed-off-by: Nour <nurmn3m@gmail.com>
Signed-off-by: Nour <nurmn3m@gmail.com>
@nmn3m nmn3m force-pushed the fix/init-container-lib-search-path branch from bea88b6 to da46b95 Compare October 20, 2025 19:06
@nmn3m
Copy link
Copy Markdown
Author

nmn3m commented Oct 23, 2025

@klueska @jgehrcke , can you please take a look at this change?

@klueska klueska added the kind/bug Categorizes issue or PR as related to a bug. label Nov 24, 2025
@klueska klueska added this to the unscheduled milestone Nov 24, 2025
@jgehrcke
Copy link
Copy Markdown
Contributor

jgehrcke commented Jan 29, 2026

Maybe the patch is good, but we still need to figure out a strong reason for why we need this patch. Which guarantee is not fulfilled, and why? #692, as it stands, isn't quite revealing. For now, let's close this, and get back to this search path topic with more clarity later.

@jgehrcke jgehrcke closed this Jan 29, 2026
@nmn3m
Copy link
Copy Markdown
Author

nmn3m commented Feb 25, 2026

@jgehrcke @klueska — I had to write a custom bash script to manually copy the missing library to the path expected by the DRA driver:

echo "Copying NVIDIA libraries to worker node..."
docker exec kind-dra-gpu-worker mkdir -p /usr/lib/x86_64-linux-gnu
for lib in /usr/lib/libnvidia-ml.so*; do
  [ -e "$lib" ] && docker cp "$lib" kind-dra-gpu-worker:/usr/lib/x86_64-linux-gnu/
done

Is there a better way to handle this, or should we address it properly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Categorizes issue or PR as related to a bug.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NVIDIA DRA Driver Kubelet Plugin Pod Stuck in Init:0/1 Status

3 participants