Works with v1.0+
This recipe deploys Nvidia NIM infrastructure, on Kubernetes, with GPUs. Specifically, we will:
- Deploy the NVIDIA GPU Operator onto Kubernetes so that pods can request GPUs.
- Select and deploy an LLM available on Nvidia NIM.
- Connect
spiceto the OpenAI compatible NIM LLM.
- A Kubernetes cluster, with at least 1 GPU node.
- Ensure that the GPU has a compute capability of 8.0 or higher.
- Local tools
-
Add the Nvidia Helm repository
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \ && helm repo update -
Install the GPU Operator
```bash helm install --wait --generate-name \ -n gpu-operator --create-namespace \ nvidia/gpu-operator ``` - For additional `helm` overrides, see [additional values](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#common-chart-customization-options). - Once the command completes (because of the
--wait), Kubernetes pods will be able to ask for GPU requests/limits.
For additional details & troubleshooting, see the official documentation.
-
Get a NGC API key from Nvidia's NGC website.
export NGC_API_KEY=""
-
Login to Nvidia's Docker registry
echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
-
Login to Nvidia's Helm registry
helm fetch https://helm.ngc.nvidia.com/nim/charts/nim-llm-1.1.2.tgz --username=\$oauthtoken --password=$NGC_API_KEY
-
Create a secret to use for pulling images from docker registries.
kubectl create secret \ docker-registry ngc-secret \ --docker-server=nvcr.io \ --docker-username='$oauthtoken' \ --docker-password=$NGC_API_KEY
-
Similar to above, create a secret to pull model weights.
kubectl create secret generic ngc-api --from-literal=NGC_API_KEY=$NGC_API_KEY -
Install the Helm chart.
helm install my-nim nim-llm-1.1.2.tgz -f values.yaml
For available models, use NGC CLI and run
ngc registry image list "nvcr.io/nim/*"
-
Add the helm repository
helm repo add spiceai https://helm.spiceai.org helm repo update
-
Deploy Spice
helm install spiceai spiceai/spiceai -f spiceai.yaml
-
Connect to Spice
kubectl port-forward deployment/spiceai 8090
-
Chat with
meta/llama3-8b-instructvia NIM.spice chat
Using model: nim chat> Tell me a joke about the moon.