This directory contains LLMInferenceServices for deploying sample models. Please refer to the deployment guide for more details on how to test the MaaS Platform with these models.
- simulator - Simple simulator for testing
- simulator-premium - Premium simulator for testing access policies (configured via MaaSAuthPolicy)
- facebook-opt-125m-cpu - Facebook OPT 125M model (CPU-based)
- qwen3 - Qwen3 model (GPU-based with autoscaling)
- ibm-granite-2b-gpu - IBM Granite 2B Instruct model (GPU-based, supports instructions)
Create the llm namespace where models are deployed (if it doesn't already exist):
kubectl create namespace llmDeploy any model using:
MODEL_NAME=simulator # or simulator-premium, facebook-opt-125m-cpu, qwen3, or ibm-granite-2b-gpu
kustomize build docs/samples/models/$MODEL_NAME | kubectl apply -f -To deploy both simulator models:
-
Deploy the standard simulator:
kustomize build docs/samples/models/simulator | kubectl apply -f - -
Deploy the premium simulator:
kustomize build docs/samples/models/simulator-premium | kubectl apply -f -
The two simulator models can be distinguished by:
-
Model Name:
- Standard:
facebook-opt-125m-simulated(from kustomization namePrefix) - Premium:
premium-simulated-simulated-premium(from kustomization namePrefix + model name)
- Standard:
-
LLMInferenceService Name:
- Standard:
facebook-opt-125m-simulated - Premium:
premium-simulated-simulated-premium
- Standard:
Subscription-based access is configured via MaaSAuthPolicy and MaaSSubscription (see docs/samples/maas-system/), not via LLMInferenceService annotations.
After deploying both models:
# List all LLMInferenceServices
kubectl get llminferenceservices -n llm