Name	Name	Last commit message	Last commit date
parent directory ..
e2e-distinct-2-simulated	e2e-distinct-2-simulated
e2e-distinct-simulated	e2e-distinct-simulated
facebook-opt-125m-cpu	facebook-opt-125m-cpu
ibm-granite-2b-gpu	ibm-granite-2b-gpu
qwen3	qwen3
simulator-premium	simulator-premium
simulator	simulator
README.md	README.md

Name

Last commit message

Last commit date

e2e-distinct-2-simulated

e2e-distinct-simulated

facebook-opt-125m-cpu

Sample LLMInferenceService Models

This directory contains LLMInferenceServices for deploying sample models. Please refer to the deployment guide for more details on how to test the MaaS Platform with these models.

Available Models

simulator - Simple simulator for testing
simulator-premium - Premium simulator for testing access policies (configured via MaaSAuthPolicy)
facebook-opt-125m-cpu - Facebook OPT 125M model (CPU-based)
qwen3 - Qwen3 model (GPU-based with autoscaling)
ibm-granite-2b-gpu - IBM Granite 2B Instruct model (GPU-based, supports instructions)

Deployment

Basic Deployment

Create the llm namespace where models are deployed (if it doesn't already exist):

kubectl create namespace llm

Deploy any model using:

MODEL_NAME=simulator # or simulator-premium, facebook-opt-125m-cpu, qwen3, or ibm-granite-2b-gpu
kustomize build docs/samples/models/$MODEL_NAME | kubectl apply -f -

Deploying Multiple Models

To deploy both simulator models:

Deploy the standard simulator:

kustomize build docs/samples/models/simulator | kubectl apply -f -

Deploy the premium simulator:

kustomize build docs/samples/models/simulator-premium | kubectl apply -f -

Distinguishing Between Models

The two simulator models can be distinguished by:

Model Name:
- Standard: facebook-opt-125m-simulated (from kustomization namePrefix)
- Premium: premium-simulated-simulated-premium (from kustomization namePrefix + model name)
LLMInferenceService Name:
- Standard: facebook-opt-125m-simulated
- Premium: premium-simulated-simulated-premium

Subscription-based access is configured via MaaSAuthPolicy and MaaSSubscription (see docs/samples/maas-system/), not via LLMInferenceService annotations.

Verifying Deployment

After deploying both models:

# List all LLMInferenceServices
kubectl get llminferenceservices -n llm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Sample LLMInferenceService Models

Available Models

Deployment

Basic Deployment

Deploying Multiple Models

Distinguishing Between Models

Verifying Deployment

FilesExpand file tree

models

Directory actions

More options

Directory actions

More options

Latest commit

History

models

Folders and files

parent directory

README.md

Sample LLMInferenceService Models

Available Models

Deployment

Basic Deployment

Deploying Multiple Models

Distinguishing Between Models

Verifying Deployment