This repository contains practical examples and demos designed to get you started quickly building AI apps with Llama Stack on Kubernetes or OpenShift. Whether you're a cluster admin looking to deploy the right GenAI infrastructure or a developer eager to innovate with AI Agents, the content in this repo should help you get started.
To run these demos, ensure your environment meets the following:
- OpenShift Cluster 4.17+
- 2 GPUs with a minimum of 40GB VRAM each.
Next, follow these simple steps to deploy the core components:
- Create a dedicated OpenShift project:
oc new-project llama-serve
- Apply the Kubernetes manifests:
This will deploy the foundational Llama Stack services, vLLM model servers, and MCP tool servers.
oc apply -k kubernetes/kustomize/overlay/all-models
We use uv
for managing Python dependencies, ensuring a consistent and efficient development experience. Here's how to get your environment ready:
- Install
uv
:pip install uv
- Synchronize your environment:
uv sync
- Activate the virtual environment:
source .venv/bin/activate
Now you're all set to run any Python scripts or Jupyter notebooks within the demos/rag_agentic
directory!
The below diagram is an example architecture for a secure Llama Stack based application deployed on OpenShift (OCP) using both MCP tools and a Milvus vectorDB for its agentic and RAG based workflows. This is the same architecture that has been implemented in the RAG/Agentic demos.
We're excited to see what you build with Llama Stack! If you have any questions or feedback, please don't hesitate to open an issue. Happy building! 🎉