This repository contains Llama Stack deployment configurations for OpenShift/Kubernetes, including Model Context Protocol (MCP) servers and model serving infrastructure using Helm charts.
⚠️ Important: The Llama Stack Operator Instance chart requires Red Hat OpenShift AI (RHOAI) 2.25 or later.
All Helm charts are located in the charts/ directory:
- llama32-3b - Llama 3.2 3B model serving with vLLM runtime
- llama-guard - Llama Guard 3 1B safety model serving
- openshift-mcp - OpenShift/Kubernetes MCP server
- slack-mcp - Slack integration MCP server
- mcp-weather - Weather MCP server
- llama-stack-operator-instance - Llama Stack Operator instances
- llama-stack-playground - Llama Stack Playground UI
- guardrails-orchestrator - Guardrails Orchestrator for AI safety
- milvus - Milvus vector database (wrapper chart with dependencies)
- Red Hat OpenShift AI (RHOAI) 2.25+ (required for Llama Stack Operator)
- Kubernetes 1.19+ or OpenShift 4.x+
- Helm 3.0+
- NVIDIA GPU nodes (for model serving charts)
- Create the target namespace:
kubectl create namespace lls-demo helm dependency build ./charts/milvus
Note: The milvus chart uses Helm dependencies. Run
helm dependency build ./charts/milvusbefore installation.
-
Deploy Model Serving:
Deploy Llama 3.2 3B:
# Standard deployment helm install llama32-3b ./charts/llama32-3b --namespace lls-demo # BU cluster with Tesla-T4-PRIVATE GPUs helm install llama32-3b ./charts/llama32-3b \ --namespace lls-demo \ --set inferenceService.buCluster=true
Deploy Llama Guard:
# Standard deployment helm install llama-guard ./charts/llama-guard --namespace lls-demo # BU cluster with Tesla-T4-PRIVATE GPUs helm install llama-guard ./charts/llama-guard \ --namespace lls-demo \ --set inferenceService.buCluster=true
-
Deploy MCP Servers:
Deploy OpenShift MCP:
helm install openshift-mcp ./charts/openshift-mcp --namespace lls-demo
Deploy Slack MCP (requires Slack credentials):
# First create a secret with your Slack credentials kubectl create secret generic slack-credentials \ --from-literal=slack-bot-token="xoxb-your-token" \ --from-literal=slack-team-id="T1234567890" \ --namespace lls-demo # Then install the chart helm install slack-mcp ./charts/slack-mcp \ --namespace lls-demo \ --set slack.existingSecret=slack-credentials
Deploy Weather MCP:
helm install mcp-weather ./charts/mcp-weather --namespace lls-demo
-
Deploy Infrastructure Components:
Deploy Milvus vector database:
# First build dependencies helm dependency build ./charts/milvus # Then install helm install milvus ./charts/milvus --namespace lls-demo
Deploy Guardrails Orchestrator:
helm install guardrails-orchestrator ./charts/guardrails-orchestrator --namespace lls-demo
-
Deploy Llama Stack Components:
Note: Requires RHOAI 2.25+ with Llama Stack Operator installed.
Deploy Llama Stack Operator Instance (default - local vLLM):
helm install llama-stack-instance ./charts/llama-stack-operator-instance --namespace lls-demo
Optional: Enable MaaS (Model as a Service):
If you want to use external LLM providers via MaaS, first create a secret with your credentials:
# Set your MaaS configuration export INFERENCE_MODEL="llama-3-2-3b" export VLLM_URL="XXX" export VLLM_TLS_VERIFY="false" export VLLM_API_TOKEN="your-maas-api-token-here" # Create the secret kubectl create secret generic llama-stack-inference-model-secret \ --from-literal=INFERENCE_MODEL="$INFERENCE_MODEL" \ --from-literal=VLLM_URL="$VLLM_URL" \ --from-literal=VLLM_TLS_VERIFY="$VLLM_TLS_VERIFY" \ --from-literal=VLLM_API_TOKEN="$VLLM_API_TOKEN" \ --namespace lls-demo # Deploy with MaaS enabled helm install llama-stack-instance ./charts/llama-stack-operator-instance \ --namespace lls-demo \ --set maas.enabled=true
Note: When MaaS is enabled, both the local vLLM provider and the MaaS provider are available, giving you access to multiple models simultaneously.
Deploy Llama Stack Playground:
helm install llama-stack-playground ./charts/llama-stack-playground --namespace lls-demo
Each chart can be customized using Helm values. See individual chart directories for detailed configuration options:
Model Serving:
MCP Servers:
Llama Stack:
Infrastructure:
# Uninstall individual components
# Model Serving
helm uninstall llama32-3b --namespace lls-demo
helm uninstall llama-guard --namespace lls-demo
# MCP Servers
helm uninstall openshift-mcp --namespace lls-demo
helm uninstall slack-mcp --namespace lls-demo
helm uninstall mcp-weather --namespace lls-demo
# Infrastructure
helm uninstall milvus --namespace lls-demo
helm uninstall guardrails-orchestrator --namespace lls-demo
# Llama Stack
helm uninstall llama-stack-instance --namespace lls-demo
helm uninstall llama-stack-playground --namespace lls-demo
# Or delete the entire namespace
kubectl delete namespace lls-demo