Skip to content

Commit fd636d1

Browse files
authored
Merge pull request #132 from ckavili/main
πŸͺ„ ADD - deploy a validated model to a disconnected env πŸͺ„
2 parents 2e4c1be + 7618258 commit fd636d1

5 files changed

Lines changed: 140 additions & 0 deletions

File tree

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Deploying a Red Hat Validated Model in a Disconnected OpenShift AI Environment
2+
3+
Red Hat AI provides access to a rich repository of **Third Party Models** validated to run efficiently across the platform. These models are available through [Red Hat’s Hugging Face repository](https://huggingface.co/collections/RedHatAI/red-hat-ai-validated-models-v10-682613dc19c4a596dbac9437).
4+
5+
The repository offers comprehensive information about each model’s architecture, optimizations, deployment options, and evaluation metrics. This information helps you make informed decisions about model selection, deployment configurations, and hardware accelerator choices tailored to your domain-specific use cases.
6+
7+
In this blog post, we will walk you through **how to deploy one of these validated models into your disconnected Red Hat OpenShift AI platform**.
8+
9+
## Step-by-Step Guide to Deploy a Model in a Disconnected Environment
10+
11+
### 1. Select the Model
12+
13+
For this blog post, as an example, we will deploy one of the popular models: [Llama 4 Scout 17B FP8](https://huggingface.co/RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic) \- an optimized large language model using FP8 quantization.
14+
15+
### 2. Gather Required Image Information
16+
17+
You have different options for deploying models to your OpenShift AI cluster. We recommend using [**Modelcar**](https://kserve.github.io/website/master/modelserving/storage/oci/#using-modelcars), as it eliminates the need to manually download models from Hugging Face, upload them to S3, and manage access. With Modelcar, you can package your models as OCI images and either pull them at runtime or precache them. This simplifies versioning, improves traceability, and integrates naturally with CI/CD workflows.
18+
19+
With modelcar approach, in order to deploy this model, you need two images:
20+
21+
- **A runtime image:** The container runtime that runs the model.
22+
- **A ModelCar image:** The packaged model artifact for deployment.
23+
24+
You can find this information in the model’s [Deployment section](https://huggingface.co/RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic#deployment) on the Hugging Face repository.
25+
26+
![redhatai-hf.png](./disconnected/redhatai-hf.png)
27+
28+
- **Runtime image** (from the *Deploy on Red Hat OpenShift AI* β†’ *ServingRuntime* section):
29+
30+
```
31+
image: quay.io/modh/vllm:rhoai-2.20-cuda
32+
```
33+
34+
* **ModelCar image** (from the *InferenceService* section):
35+
36+
```
37+
storageUri: oci://registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5
38+
```
39+
40+
That means, below are the two images you should have in your registry:
41+
42+
```
43+
- quay.io/modh/vllm:rhoai-2.20-cuda
44+
- registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5
45+
```
46+
47+
### 3. Mirror Images to Your Disconnected OpenShift Cluster
48+
49+
Mirroring container images to a disconnected OpenShift cluster means copying them from a connected environment (your local machine or a connected bastion host) to disconnected OpenShift cluster's internal image registry or a private image registry accessible from your disconnected cluster.
50+
51+
**Prerequisites:**
52+
53+
* Access to a connected environment that can pull images from image registries.
54+
* Access to your disconnected OpenShift cluster and its internal or your private image registry.
55+
* `oc` CLI installed and configured to access both environments.
56+
57+
We will use the `oc image mirror` utility for this process. While this blog shows mirroring to an internal OpenShift registry, the same applies to external private registries (e.g., self-hosted Quay or Artifactory).
58+
59+
***Note:** If you’re using a **mirror registry** configured with **`oc-mirror`**, you can also include specific images by listing them under the `additionalImages` section in your `ImageSetConfig`. Refer to the official [documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/disconnected_environments/mirroring-in-disconnected-environments) for details.*
60+
61+
Before start mirroring, let’s verify that you are able to login to the relevant registries and your OpenShift cluster:
62+
63+
```shell
64+
oc login <your-cluster-api-url>
65+
oc registry login # for internal registry
66+
podman login registry.redhat.io
67+
podman login <your-private-registry>
68+
```
69+
70+
####
71+
72+
### 4. Mirroring the Images
73+
74+
Mirroring time depends on image size, network speed, and registry performance. Small images may take a few minutes, while large model images can take longer.
75+
76+
The general command format is:
77+
78+
```shell
79+
oc image mirror <source-image> <destination-image>
80+
```
81+
82+
#### Mirror the vLLM Runtime Image
83+
84+
```shell
85+
oc image mirror quay.io/modh/vllm:rhoai-2.20-cuda default-route-openshift-image-registry.apps.example-domain.com/<project-name>/vllm:rhoai-2.20-cuda
86+
```
87+
88+
***Note:*** *If you're running OpenShift AI 2.20 or later and have already mirrored the required images, the vLLM image needed to serve this model may already be available in your environment.*
89+
90+
#### Mirror the Modelcar Image
91+
92+
```shell
93+
oc image mirror registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5 default-route-openshift-image-registry.apps.example-domain.com/<project-name>/modelcar-llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5
94+
```
95+
96+
### 5. Deploy the Model in Your Disconnected Cluster
97+
98+
After mirroring the images, navigate back to the model’s **Deployment** page on Hugging Face and expand the **Red Hat OpenShift AI** option.
99+
100+
To deploy the model, you need to create the required `ServingRuntime` and `InferenceService` objects in your namespace.
101+
102+
You can copy the provided YAML files and apply them to your disconnected cluster.
103+
104+
β€ΌοΈπŸš¨**Important:** Before applying the YAMLs, make sure to update all **image references** in both the **ServingRuntime** and **InferenceService** to point to your mirrored images. This ensures OpenShift AI can pull the images inside your disconnected environment.
105+
106+
```shell
107+
# Apply the ServingRuntime
108+
oc apply -f vllm-servingruntime.yaml -n <project-name>
109+
110+
# Apply the InferenceService
111+
oc apply -f llama4-inferenceservice.yaml -n <project-name>
112+
113+
```
114+
115+
Alternatively, you can apply the YAMLs on OpenShift console by clicking the `+` sign > `Import YAML`:
116+
117+
![import-yaml.png](./disconnected/import-yaml.png)
118+
119+
If you encounter an error like the following, check that your object names comply with **DNS naming conventions** β€” uppercase letters are not allowed.
120+
121+
```shell
122+
Error "Invalid value: "LLama-4-Scout-17B-16E-Instruct-FP8-Dynamic": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')" for field "metadata.name".
123+
```
124+
125+
> Make sure names use only lowercase letters, numbers, and hyphens, and follow Kubernetes' DNS label requirements.
126+
---
127+
128+
## Summary
129+
130+
Deploying validated models from Red Hat AI’s Hugging Face Validated Models repository in disconnected OpenShift AI environments involves:
131+
132+
* Selecting the desired model.
133+
* Identifying the required runtime and model images.
134+
* Mirroring these images to your cluster’s internal or private registry.
135+
* Updating the deployment instructions to reflect mirrored image references.
136+
137+
This process ensures your AI workloads run seamlessly even in restricted or disconnected environments, enabling you to leverage validated, optimized AI models securely.
474 KB
Loading
547 KB
Loading

β€Ždocs/whats-new/whats-new.mdβ€Ž

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# What's new?
22

3+
**2025-07-15**: Add [Deploying a Red Hat Validated Model in a Disconnected OpenShift AI Environment](../odh-rhoai/deploy-validated-models-on-disconnected.md)
4+
35
**2025-04-03**: Add [AI for Everyone: What We Learned](../generative-ai/ai-for-everyone.md)
46

57
**2025-04-01**: Add [Building an Image Generation App: What We Learned](../generative-ai/building-an-image-generation-app.md)

β€Žmkdocs.ymlβ€Ž

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@ nav:
123123
- RHOAI Metrics Dashboard for Model Serving: odh-rhoai/kserve-uwm-dashboard-metrics.md
124124
- Connect to RHOAI Workbench Kernel from VS Code: odh-rhoai/connect-vscode-to-rhoai-wb.md
125125
- AnythingLLM as Custom Workbench: odh-rhoai/custom-workbench-anythingllm.md
126+
- Deploying a Red Hat Validated Model in a Disconnected OpenShift AI Environment: deploy-validated-models-on-disconnected.md
126127
- Tools:
127128
- GPU pruner: odh-rhoai/gpu-pruner.md
128129
- ODH Tools and Extensions Companion: odh-rhoai/odh-tools-and-extensions-companion.md

0 commit comments

Comments
Β (0)