|
| 1 | +# Deploying a Red Hat Validated Model in a Disconnected OpenShift AI Environment |
| 2 | + |
| 3 | +Red Hat AI provides access to a rich repository of **Third Party Models** validated to run efficiently across the platform. These models are available through [Red Hatβs Hugging Face repository](https://huggingface.co/collections/RedHatAI/red-hat-ai-validated-models-v10-682613dc19c4a596dbac9437). |
| 4 | + |
| 5 | +The repository offers comprehensive information about each modelβs architecture, optimizations, deployment options, and evaluation metrics. This information helps you make informed decisions about model selection, deployment configurations, and hardware accelerator choices tailored to your domain-specific use cases. |
| 6 | + |
| 7 | +In this blog post, we will walk you through **how to deploy one of these validated models into your disconnected Red Hat OpenShift AI platform**. |
| 8 | + |
| 9 | +## Step-by-Step Guide to Deploy a Model in a Disconnected Environment |
| 10 | + |
| 11 | +### 1. Select the Model |
| 12 | + |
| 13 | +For this blog post, as an example, we will deploy one of the popular models: [Llama 4 Scout 17B FP8](https://huggingface.co/RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic) \- an optimized large language model using FP8 quantization. |
| 14 | + |
| 15 | +### 2. Gather Required Image Information |
| 16 | + |
| 17 | +You have different options for deploying models to your OpenShift AI cluster. We recommend using [**Modelcar**](https://kserve.github.io/website/master/modelserving/storage/oci/#using-modelcars), as it eliminates the need to manually download models from Hugging Face, upload them to S3, and manage access. With Modelcar, you can package your models as OCI images and either pull them at runtime or precache them. This simplifies versioning, improves traceability, and integrates naturally with CI/CD workflows. |
| 18 | + |
| 19 | +With modelcar approach, in order to deploy this model, you need two images: |
| 20 | + |
| 21 | +- **A runtime image:** The container runtime that runs the model. |
| 22 | +- **A ModelCar image:** The packaged model artifact for deployment. |
| 23 | + |
| 24 | +You can find this information in the modelβs [Deployment section](https://huggingface.co/RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic#deployment) on the Hugging Face repository. |
| 25 | + |
| 26 | + |
| 27 | + |
| 28 | +- **Runtime image** (from the *Deploy on Red Hat OpenShift AI* β *ServingRuntime* section): |
| 29 | + |
| 30 | +``` |
| 31 | +image: quay.io/modh/vllm:rhoai-2.20-cuda |
| 32 | +``` |
| 33 | + |
| 34 | +* **ModelCar image** (from the *InferenceService* section): |
| 35 | + |
| 36 | +``` |
| 37 | +storageUri: oci://registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5 |
| 38 | +``` |
| 39 | + |
| 40 | +That means, below are the two images you should have in your registry: |
| 41 | + |
| 42 | +``` |
| 43 | +- quay.io/modh/vllm:rhoai-2.20-cuda |
| 44 | +- registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5 |
| 45 | +``` |
| 46 | + |
| 47 | +### 3. Mirror Images to Your Disconnected OpenShift Cluster |
| 48 | + |
| 49 | +Mirroring container images to a disconnected OpenShift cluster means copying them from a connected environment (your local machine or a connected bastion host) to disconnected OpenShift cluster's internal image registry or a private image registry accessible from your disconnected cluster. |
| 50 | + |
| 51 | +**Prerequisites:** |
| 52 | + |
| 53 | +* Access to a connected environment that can pull images from image registries. |
| 54 | +* Access to your disconnected OpenShift cluster and its internal or your private image registry. |
| 55 | +* `oc` CLI installed and configured to access both environments. |
| 56 | + |
| 57 | +We will use the `oc image mirror` utility for this process. While this blog shows mirroring to an internal OpenShift registry, the same applies to external private registries (e.g., self-hosted Quay or Artifactory). |
| 58 | + |
| 59 | +***Note:** If youβre using a **mirror registry** configured with **`oc-mirror`**, you can also include specific images by listing them under the `additionalImages` section in your `ImageSetConfig`. Refer to the official [documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/disconnected_environments/mirroring-in-disconnected-environments) for details.* |
| 60 | + |
| 61 | +Before start mirroring, letβs verify that you are able to login to the relevant registries and your OpenShift cluster: |
| 62 | + |
| 63 | +```shell |
| 64 | +oc login <your-cluster-api-url> |
| 65 | +oc registry login # for internal registry |
| 66 | +podman login registry.redhat.io |
| 67 | +podman login <your-private-registry> |
| 68 | +``` |
| 69 | + |
| 70 | +#### |
| 71 | + |
| 72 | +### 4. Mirroring the Images |
| 73 | + |
| 74 | +Mirroring time depends on image size, network speed, and registry performance. Small images may take a few minutes, while large model images can take longer. |
| 75 | + |
| 76 | +The general command format is: |
| 77 | + |
| 78 | +```shell |
| 79 | +oc image mirror <source-image> <destination-image> |
| 80 | +``` |
| 81 | + |
| 82 | +#### Mirror the vLLM Runtime Image |
| 83 | + |
| 84 | +```shell |
| 85 | +oc image mirror quay.io/modh/vllm:rhoai-2.20-cuda default-route-openshift-image-registry.apps.example-domain.com/<project-name>/vllm:rhoai-2.20-cuda |
| 86 | +``` |
| 87 | + |
| 88 | +***Note:*** *If you're running OpenShift AI 2.20 or later and have already mirrored the required images, the vLLM image needed to serve this model may already be available in your environment.* |
| 89 | + |
| 90 | +#### Mirror the Modelcar Image |
| 91 | + |
| 92 | +```shell |
| 93 | +oc image mirror registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5 default-route-openshift-image-registry.apps.example-domain.com/<project-name>/modelcar-llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5 |
| 94 | +``` |
| 95 | + |
| 96 | +### 5. Deploy the Model in Your Disconnected Cluster |
| 97 | + |
| 98 | +After mirroring the images, navigate back to the modelβs **Deployment** page on Hugging Face and expand the **Red Hat OpenShift AI** option. |
| 99 | + |
| 100 | +To deploy the model, you need to create the required `ServingRuntime` and `InferenceService` objects in your namespace. |
| 101 | + |
| 102 | +You can copy the provided YAML files and apply them to your disconnected cluster. |
| 103 | + |
| 104 | +βΌοΈπ¨**Important:** Before applying the YAMLs, make sure to update all **image references** in both the **ServingRuntime** and **InferenceService** to point to your mirrored images. This ensures OpenShift AI can pull the images inside your disconnected environment. |
| 105 | + |
| 106 | +```shell |
| 107 | +# Apply the ServingRuntime |
| 108 | +oc apply -f vllm-servingruntime.yaml -n <project-name> |
| 109 | + |
| 110 | +# Apply the InferenceService |
| 111 | +oc apply -f llama4-inferenceservice.yaml -n <project-name> |
| 112 | + |
| 113 | +``` |
| 114 | + |
| 115 | +Alternatively, you can apply the YAMLs on OpenShift console by clicking the `+` sign > `Import YAML`: |
| 116 | + |
| 117 | + |
| 118 | + |
| 119 | +If you encounter an error like the following, check that your object names comply with **DNS naming conventions** β uppercase letters are not allowed. |
| 120 | + |
| 121 | +```shell |
| 122 | +Error "Invalid value: "LLama-4-Scout-17B-16E-Instruct-FP8-Dynamic": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')" for field "metadata.name". |
| 123 | +``` |
| 124 | + |
| 125 | +> Make sure names use only lowercase letters, numbers, and hyphens, and follow Kubernetes' DNS label requirements. |
| 126 | +--- |
| 127 | + |
| 128 | +## Summary |
| 129 | + |
| 130 | +Deploying validated models from Red Hat AIβs Hugging Face Validated Models repository in disconnected OpenShift AI environments involves: |
| 131 | + |
| 132 | +* Selecting the desired model. |
| 133 | +* Identifying the required runtime and model images. |
| 134 | +* Mirroring these images to your clusterβs internal or private registry. |
| 135 | +* Updating the deployment instructions to reflect mirrored image references. |
| 136 | + |
| 137 | +This process ensures your AI workloads run seamlessly even in restricted or disconnected environments, enabling you to leverage validated, optimized AI models securely. |
0 commit comments