A CLI application for running validation checks against Kubernetes clusters in the context of Red Hat AI Inference Server (KServe LLMInferenceService) on managed Kubernetes platforms (AKS, CoreWeave etc.). The tool connects to a running Kubernetes cluster, detects the cloud provider, and executes a series of validation tests to ensure the cluster is properly configured and ready for use.
- Cloud Provider Detection: Automatically detects cloud provider (Azure, AWS) or allows manual specification
- Configurable Logging: Adjustable log levels for debugging and monitoring
- Flexible Configuration: Supports command-line arguments, config files, and environment variables
- Test Framework: Extensible test execution framework for preflight validations
- Test Reporting: Detailed test results with suggested actions for failures
| Cloud provider | Managed K8s Service |
|---|---|
| Azure | AKS |
This tool can be packaged and run as a container image and a Containerfile is provided, along with scripts to ease the build process.
In order to build a container locally:
make imageThe container is built on top of UBI9 (Universal Base Image 9.5).
The resulting container image repository (name) and tag can be customized by using CONTAINER_REPO and CONTAINER_TAG environment variables:
CONTAINER_REPO=quay.io/myusername/llm-d-xks-preflight CONTAINER_TAG=mytag make imageAfter building the container image as described above, a helper script to run the validations against a Kubernetes cluster is available:
# run all tests
make run
# run specific test suite (cluster or operators)
SUITE=cluster make run
SUITE=operators make run
# if the image name and tag have been customized
CONTAINER_REPO=quay.io/myusername/llm-d-xks-preflight CONTAINER_TAG=mytag make runIf the path to the cluster credentials Kube config is not the standard ~/.kube/config, the environment variable HOST_KUBECONFIG can be used to designate the correct path:
HOST_KUBECONFIG=/path/to/kube/config make runSuite: cluster -- Cluster readiness tests
| Test name | Meaning |
|---|---|
cloud_provider |
The validation script tries to determine the cloud provider the cluster is running on. Can be overridden with --cloud-provider |
instance_type |
At least one supported instance type must be present as a cluster node. See below for details. |
gpu_availability |
At least one supported GPU must be available on a cluster node. Availability is determined by driver presence and node labels |
Suite: operators -- Operator readiness tests
| Test name | Meaning |
|---|---|
crd_certmanager |
The tool checks if cert-manager CRDs are present on the cluster |
operator_certmanager |
Check if cert-manager deployments are ready |
crd_sailoperator |
The tool checks if sail-operator CRDs are present on the cluster |
operator_sail |
Check if sail-operator deployments are ready |
crd_lwsoperator |
The tool checks if lws-operator CRDs are present on the cluster |
operator_lws |
Check if lws-operator deployments are ready |
crd_kserve |
The tool checks if kserve CRDs are present on the cluster |
operator_kserve |
Check if kserve-controller-manager deployment is ready |
At the end, a brief report is printed with PASSED or FAILED status for each of the above tests and the suggested action the user should follow.
Azure Supported Instance Types:
Standard_NC24ads_A100_v4(NVIDIA A100)Standard_ND96asr_v4(NVIDIA A100)Standard_ND96amsr_A100_v4(NVIDIA A100)Standard_ND96isr_H100_v5(NVIDIA H100)Standard_ND96isr_H200_v5(NVIDIA H200)
Required dependencies:
configargparse>=1.7.1kubernetes>=34.1
-l, --log-level: Set the log level (choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, default: INFO)-k, --kube-config: Path to the kubeconfig file (overrides KUBECONFIG environment variable)-u, --cloud-provider: Cloud provider to perform checks on (choices: auto, azure, default: auto)-c, --config: Path to a custom config file-s, --suite: Test suite to run (choices: all, cluster, operators, default: all)-h, --help: Show help message
The application automatically looks for config files in the following locations (in order):
~/.llmd-xks-preflight.conf(user home directory)./llmd-xks-preflight.conf(current directory)/etc/llmd-xks-preflight.conf(system-wide)
You can also specify a custom config file:
CONFIG=/path/to/config.conf make runExample config file:
log_level = INFO
kube_config = /path/to/kubeconfig
cloud_provider = azureLLMD_XKS_LOG_LEVEL: Log level (same choices as--log-level)LLMD_XKS_CLOUD_PROVIDER: Cloud provider (choices: auto, azure)LLMD_XKS_SUITE: Test suite to run (choices: all(default), cluster, operators)KUBECONFIG: Path to kubeconfig file (standard Kubernetes environment variable)