Latest functional testing shows you need OCP 4.16 or greater. It has been verified for as recent a version as 4.19.
From a clone of this repo, run from the same directory as this README:
oc apply -k ./kustomize/oc apply -k ./kustomize-rhoai/If you provisioned a cluster with GPUs, run this kustomize before running either the ODH or RHOAI kustomizes
oc apply -k ./kustomize-gpuThe NodeFeature subscription is OCP version specific. The nodefeature-subscription.yaml currently references a
4.19 version. You can edit that file for your OCP version, and run the contents of the kustomize-gpu/job.yaml Job
manually:
bash ./subscriptions-gpu.sh
bash ./cpu-setup.sh
bash ./nfd-setup.sh
# this will clean up status files created during the subscriptions setup
rm *.txtSo the Running LlamaStack Operator with ODH instructions, after some minor modifications, were able to produce:
- a running llama 3.2 3B instruct model as vLLM Nvidia GPU KServe InferenceService instance in the
llamastacknamespace. - a running llama-stack instance that uses the llama 3.2 3B instruct model
Tweaks to the instructions there include:
- The UI / Dashboard flow directions do not line up exactly with the 2.33 ODH console on OCP. Instead, go into the
llamastackproject create by thekustomize-gpuJob, and select single server model serving for thellamastackproject - Then click the
Connectionstab near the top - Click Create a new Connection
- The values for Connection name, type, and URI for
Create a Connectionare still correct. - For deploying the model, while still in the
llamastackproject, click theModelstab near th top - Click the deploy a model button
- The specific field settings in the instructions are still correct for those fields mentioned
- But also select the expose the model by a route, and disable authentication.
- When creating the
LlamaStackDistributionCR in step three, using the service URL forVLLM_URLdid not work. Changing it to the URL of theRoutecreated for the model does work (where you add the/v1suffix to theRouteURL) - Also, set the
mountPathin the last line to the default - You'll have to create your own
LlamaStackDistributionyaml for your cluster, but a reference example exists in the filellamastackdistribution-gabe-pers-cluster.yaml. - Lastly, the various python snippets in Query the Model from Jupyter Notebook have been put in the file
jupyter-nb-test.pyfile for convenience. And similarly to the prior steps, the UI / Dashboard navigation directions don't quite line up with what you will see with ODH 2.33 running on OCP. Assuming you are still in thellamastackproject, click theWorkbenchestab near the top, create a new notebook, and go from there. As long as a 3.12 python workbook type is chosen, so far, any of those choices seem to be OK.