Replies: 7 comments 9 replies
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
-
|
@klueska is there a publicly shared version of your document describing the overall design of this feature? |
Beta Was this translation helpful? Give feedback.
-
|
@klueska are there instructions somewhere on how to configure the runtime? Warning FailedCreatePodSandBox 0s (x13 over 2m33s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "nvidia" is configured |
Beta Was this translation helpful? Give feedback.
-
|
I see that Pod has also applied for the nvidia.com/gpu extension resource. Do we still need device-plugin now? Why? |
Beta Was this translation helpful? Give feedback.
-
|
how does |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Prerequisites:
Setup GPU Operator and install DRA driver
Add the NVIDIA helm repository:
Install the the GPU Operator and the NVIDIA DRA Driver for GPUs:
With HOST managed drivers
Prerequisites:
Install the GPU Operator:
Install the DRA Driver for GPUs:
With OPERATOR managed drivers
Install the GPU Operator:
Install the DRA Driver for GPUs:
Validate GPU Operator and DRA driver running
Validate that all GPU Operator components are running and in a Ready state:
Validate that the DRA driver components are running and in a Ready state:
Confirm that all GPU nodes are labeled with clique ids:
Simple IMEX channel injection with IMEX daemon running
Run a simple test to validate IMEX daemons are started and IMEX channels are injected:
Multi-Node MPI test
Install the latest version of the MPI Operator:
Run a multi-node
nvbandwidthtest requiring IMEX channels with MPI:Beta Was this translation helpful? Give feedback.
All reactions