Skip to content

Latest commit

 

History

History
 
 

README.md

Sample LLMInferenceService Models

This directory contains LLMInferenceServices for deploying sample models. Please refer to the deployment guide for more details on how to test the MaaS Platform with these models.

Available Models

  • simulator - Simple simulator for testing
  • simulator-premium - Premium simulator for testing access policies (configured via MaaSAuthPolicy)
  • facebook-opt-125m-cpu - Facebook OPT 125M model (CPU-based)
  • qwen3 - Qwen3 model (GPU-based with autoscaling)
  • ibm-granite-2b-gpu - IBM Granite 2B Instruct model (GPU-based, supports instructions)

Deployment

Basic Deployment

Create the llm namespace where models are deployed (if it doesn't already exist):

kubectl create namespace llm

Deploy any model using:

MODEL_NAME=simulator # or simulator-premium, facebook-opt-125m-cpu, qwen3, or ibm-granite-2b-gpu
kustomize build docs/samples/models/$MODEL_NAME | kubectl apply -f -

Deploying Multiple Models

To deploy both simulator models:

  1. Deploy the standard simulator:

    kustomize build docs/samples/models/simulator | kubectl apply -f -
  2. Deploy the premium simulator:

    kustomize build docs/samples/models/simulator-premium | kubectl apply -f -

Distinguishing Between Models

The two simulator models can be distinguished by:

  • Model Name:

    • Standard: facebook-opt-125m-simulated (from kustomization namePrefix)
    • Premium: premium-simulated-simulated-premium (from kustomization namePrefix + model name)
  • LLMInferenceService Name:

    • Standard: facebook-opt-125m-simulated
    • Premium: premium-simulated-simulated-premium

Subscription-based access is configured via MaaSAuthPolicy and MaaSSubscription (see docs/samples/maas-system/), not via LLMInferenceService annotations.

Verifying Deployment

After deploying both models:

# List all LLMInferenceServices
kubectl get llminferenceservices -n llm