Skip to content

Commit ede1179

Browse files
authored
Merge branch 'main' into kserve-image-pull-secret
2 parents ef76c90 + ecf7dab commit ede1179

File tree

13 files changed

+256
-21
lines changed

13 files changed

+256
-21
lines changed

.github/workflows/docs.yml

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
name: Build docs
2+
on:
3+
push:
4+
branches:
5+
- main
6+
permissions:
7+
contents: read
8+
pages: write
9+
id-token: write
10+
jobs:
11+
deploy:
12+
environment:
13+
name: github-pages
14+
url: ${{ steps.deployment.outputs.page_url }}
15+
runs-on: ubuntu-latest
16+
steps:
17+
- uses: actions/configure-pages@v5
18+
- uses: actions/checkout@v5
19+
- uses: actions/setup-python@v5
20+
with:
21+
python-version: 3.x
22+
- run: pip install zensical
23+
- name: Copy README to docs with fixed relative links
24+
run: |
25+
python3 -c "
26+
import re, os
27+
repo_url = os.environ['REPO_URL']
28+
with open('README.md') as f:
29+
content = f.read()
30+
# Strip ./docs/ prefix so links resolve within the docs directory
31+
content = content.replace('./docs/', './')
32+
# Convert remaining relative links with directory paths to absolute GitHub URLs
33+
# (these point to repo files outside docs/, e.g. ./validation/README.md)
34+
content = re.sub(
35+
r'(\[[^\]]*\])\(\./([^)]*\/[^)]*)\)',
36+
rf'\1({repo_url}/blob/main/\2)',
37+
content,
38+
)
39+
with open('docs/index.md', 'w') as f:
40+
f.write(content)
41+
"
42+
env:
43+
REPO_URL: ${{ github.server_url }}/${{ github.repository }}
44+
- run: zensical build --clean
45+
- uses: actions/upload-pages-artifact@v4
46+
with:
47+
path: site
48+
- uses: actions/deploy-pages@v4
49+
id: deployment

.github/workflows/link-check.yaml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
name: Check links
2+
3+
on:
4+
pull_request:
5+
paths:
6+
- '**/*.md'
7+
- '.lychee.toml'
8+
- '.github/workflows/link-check.yaml'
9+
10+
permissions:
11+
contents: read
12+
13+
jobs:
14+
link-check:
15+
runs-on: ubuntu-latest
16+
steps:
17+
- uses: actions/checkout@v5
18+
19+
- uses: lycheeverse/lychee-action@v2
20+
with:
21+
args: >
22+
--no-progress
23+
--exclude '^https?://localhost'
24+
'**/*.md'
25+
fail: true
26+
env:
27+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,5 @@
11
/.idea
2+
.claude/
3+
site/
4+
*.swp
5+
*.swo

.lychee.toml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# https://lychee.cli.rs/usage/config/
2+
include_mail = false
3+
max_concurrency = 16
4+
max_retries = 3
5+
user_agent = "Mozilla/5.0 (compatible; lychee/0.18; +https://github.com/lycheeverse/lychee)"
6+
timeout = 30
7+
accept = [200, 203, 429]
8+
include_fragments = false
9+
include_verbatim = true
10+
exclude = [
11+
"^https?://[0-9]+\\.xx\\.",
12+
"^hf://",
13+
"^registry\\.redhat\\.io",
14+
]

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# rhaii-on-xks
1+
# Red Hat AI Inference on managed Kubernetes
22

33
Infrastructure Helm charts for deploying Red Hat AI Inference Server (KServe LLMInferenceService) on managed Kubernetes platforms (AKS, CoreWeave).
44

charts/cert-manager-operator/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ The update-bundle.sh script:
148148
Personal Red Hat pull secrets and tokens typically expire (yearly). Registry
149149
Service Accounts created via the [Red Hat terms-based registry](https://access.redhat.com/terms-based-registry/)
150150
do not expire and are recommended for production (see Section 1.3 of the
151-
[deployment guide](../docs/deploying-llm-d-on-managed-kubernetes.md)).
151+
[deployment guide](../../docs/deploying-llm-d-on-managed-kubernetes.md)).
152152

153153
To update expiring credentials:
154154

charts/kserve/files/resources.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -648,7 +648,7 @@ data:
648648
}
649649
oauthProxy: |-
650650
{
651-
"image" : "registry.redhat.io/rhoai/odh-kube-auth-proxy-rhel9@sha256:169d9fe4dc6032344b295221ccbfa20f28e54f6ef490452b21459488bf472f8d",
651+
"image" : "registry.redhat.io/rhoai/odh-kube-auth-proxy-rhel9@sha256:2c4be58b9cbbfbf0cce82771f9823f5df664a21c139feee9e4f8beb9cf3ad76a",
652652
"memoryRequest": "64Mi",
653653
"memoryLimit": "128Mi",
654654
"cpuRequest": "100m",
@@ -711,7 +711,7 @@ data:
711711
kserve-llm-d-routing-sidecar: registry.redhat.io/rhoai/odh-llm-d-routing-sidecar-rhel9@sha256:7f93742da18df2ce220cd8d6a0310c18af6fe04905c83f23d022e065716ebd88
712712
kserve-router: registry.redhat.io/rhoai/odh-kserve-router-rhel9@sha256:26dc51b1f099964196c35bbc0801a5523da75c16095733c9870b7c46b1677871
713713
kserve-storage-initializer: registry.redhat.io/rhoai/odh-kserve-storage-initializer-rhel9@sha256:37d31edc075adf26a529197281797a24b76bd4924d7903c2754992174959ee91
714-
kube-rbac-proxy: registry.redhat.io/rhoai/odh-kube-auth-proxy-rhel9@sha256:169d9fe4dc6032344b295221ccbfa20f28e54f6ef490452b21459488bf472f8d
714+
kube-rbac-proxy: registry.redhat.io/rhoai/odh-kube-auth-proxy-rhel9@sha256:2c4be58b9cbbfbf0cce82771f9823f5df664a21c139feee9e4f8beb9cf3ad76a
715715
kind: ConfigMap
716716
metadata:
717717
name: kserve-parameters
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Workaround: Azure Load Balancer Health Probe for Istio Gateway
2+
3+
## Problem
4+
5+
When deploying KServe with Istio Gateway API on AKS, external traffic to the inference gateway on port 80 times out, even though the gateway pod is running and works fine from inside the cluster.
6+
7+
Port 15021 (Istio health port) works externally, but port 80 does not.
8+
9+
## Root Cause
10+
11+
AKS automatically creates an **HTTP health probe** for LoadBalancer service ports that have `appProtocol: http` set. The Istio Gateway controller sets `appProtocol: http` on port 80 by default.
12+
13+
The HTTP health probe sends `GET /` to the nodePort backing port 80. Since no HTTPRoute matches `/`, Istio returns **404**. The Azure Load Balancer treats this as unhealthy and **stops forwarding all traffic** to port 80.
14+
15+
Port 15021 works because its health probe uses **TCP** (just checks if the port is open).
16+
17+
```text
18+
Azure LB health probe → HTTP GET / → nodePort → Istio port 80 → 404 → backend marked unhealthy → all traffic dropped
19+
```
20+
21+
## Why deploying an HTTPRoute doesn't fix it
22+
23+
Deploying an HTTPRoute for your model (e.g., `/llm-inference/qwen2-7b-instruct/...`) does not fix this because the health probe hits `/`, not your model path. Unless you have a route that explicitly matches `/` and returns 200, the probe will continue to fail.
24+
25+
## Fix
26+
27+
Annotate the `inference-gateway-istio` service to use a **TCP health probe** for the affected port instead of HTTP.
28+
29+
> **Note:** The port number in the annotation (`port_80`) must match the Gateway listener port. Port 80 is used here because that is what `setup-gateway.sh` configures in the Gateway's `listeners` spec. If your Gateway uses a different port, update the annotation key accordingly (e.g., `port_8080_health-probe_protocol`).
30+
31+
```bash
32+
kubectl annotate svc inference-gateway-istio -n opendatahub \
33+
"service.beta.kubernetes.io/port_80_health-probe_protocol=tcp" \
34+
--overwrite
35+
```
36+
37+
This annotation is applied automatically on AKS when using `setup-gateway.sh`. The manual command above is only needed if you recreate the Gateway without re-running the setup script.
38+
39+
### Verify the probe changed
40+
41+
```bash
42+
# Find the MC resource group
43+
NODE_RG=$(az aks show --resource-group <rg> --name <cluster> --query nodeResourceGroup -o tsv)
44+
45+
# Check probes
46+
az network lb probe list --resource-group "$NODE_RG" --lb-name kubernetes -o table
47+
```
48+
49+
The port 80 probe should now show `Protocol: Tcp` instead of `Http`.
50+
51+
## How to diagnose this issue
52+
53+
1. Verify the gateway works from inside the cluster (bypasses Azure LB):
54+
```bash
55+
kubectl run curl-test --rm -i --restart=Never --image=curlimages/curl \
56+
-- curl -s -o /dev/null -w "HTTP %{http_code}" \
57+
http://inference-gateway-istio.opendatahub.svc.cluster.local:80/
58+
```
59+
If this returns 404 but external access times out, the LB health probe is the issue.
60+
61+
2. Check the Azure LB health probe configuration:
62+
```bash
63+
NODE_RG=$(az aks show --resource-group <rg> --name <cluster> --query nodeResourceGroup -o tsv)
64+
az network lb probe list --resource-group "$NODE_RG" --lb-name kubernetes -o table
65+
```
66+
If the port 80 probe shows `Protocol: Http` and `RequestPath: /`, that confirms the problem.
67+
68+
> **Note:** If the `inference-gateway-istio` service is annotated with `service.beta.kubernetes.io/azure-load-balancer-internal: "true"`, use `--lb-name kubernetes-internal` instead.
69+
70+
## Notes
71+
72+
- This is an AKS-specific issue. AWS and GCP load balancers default to TCP health checks.
73+
- On AKS clusters v1.24+, `spec.ports.appProtocol` is used as the health probe protocol with `/` as the default request path. Since the Istio Gateway controller sets `appProtocol: http` on port 80, AKS creates an HTTP probe by default.
74+
- The annotation `service.beta.kubernetes.io/port_80_health-probe_protocol` is a per-port override. The generic `service.beta.kubernetes.io/azure-load-balancer-health-probe-protocol` applies to all ports but may not take effect if the gateway controller reconciles the service.
75+
- The Istio gateway service is managed by the Gateway controller and has no annotations by default.
76+
77+
## References
78+
79+
- [Configure a Public Standard Load Balancer in AKS](https://learn.microsoft.com/en-us/azure/aks/configure-load-balancer-standard) — Microsoft documentation on per-port health probe annotation overrides and default probe behavior.
80+
- [Troubleshoot AKS Health Probe Mode](https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/cluster-service-health-probe-mode-issues) — Troubleshooting guide for health probe issues.

docs/deploying-llm-d-on-managed-kubernetes.md

Lines changed: 29 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -165,18 +165,16 @@ Red Hat AI Inference Server on managed Kubernetes consists of the following comp
165165

166166
### Component Interaction
167167

168-
```text
169-
┌─────────────────────────────────────┐
170-
│ Kubernetes Cluster │
171-
┌──────────┐ ┌──────────────┐ │ ┌─────────┐ ┌────────────────┐ │
172-
│ Client │───▶│ Gateway │───▶│ │ EPP │───▶│ vLLM Pods │ │
173-
│ │ │ (Istio) │ │ │Scheduler│ │ (Model) │ │
174-
└──────────┘ └──────────────┘ │ └─────────┘ └────────────────┘ │
175-
│ ▲ ▲ │
176-
│ │ mTLS │ │
177-
│ └───────────────┘ │
178-
│ cert-manager │
179-
└─────────────────────────────────────┘
168+
```mermaid
169+
graph LR
170+
Client --> Gateway["Gateway<br/>(Istio)"]
171+
172+
subgraph Kubernetes Cluster
173+
Gateway --> EPP["EPP<br/>Scheduler"]
174+
EPP --> vLLM["vLLM Pods<br/>(Model)"]
175+
cm["cert-manager"] -. mTLS .-> EPP
176+
cm["cert-manager"] -. mTLS .-> vLLM
177+
end
180178
```
181179

182180
---
@@ -271,6 +269,20 @@ Verify the Gateway pod is running:
271269
kubectl get pods -n opendatahub -l gateway.networking.k8s.io/gateway-name=inference-gateway
272270
```
273271

272+
### 4.3 AKS: Fix Load Balancer Health Probe
273+
274+
On AKS, external traffic to the inference gateway on port 80 may time out due to the Azure Load Balancer using an HTTP health probe that fails against the Istio gateway. This is handled automatically by `setup-gateway.sh` on AKS.
275+
276+
If you need to apply it manually (e.g., after recreating the Gateway):
277+
278+
```bash
279+
kubectl annotate svc inference-gateway-istio -n opendatahub \
280+
"service.beta.kubernetes.io/port_80_health-probe_protocol=tcp" \
281+
--overwrite
282+
```
283+
284+
> **Note:** The port number in the annotation must match the Gateway listener port (`80` here, as configured in `setup-gateway.sh`). If the Gateway is deleted and recreated without re-running `setup-gateway.sh`, the annotation will be lost and must be reapplied. See [Azure LB Health Probe Workaround](./azure-lb-health-probe-workaround.md) for full details.
285+
274286
---
275287

276288
## 5. Deploying an LLM Inference Service
@@ -588,7 +600,8 @@ make deploy-kserve
588600
For assistance with Red Hat AI Inference Server deployments, contact Red Hat Support or consult the product documentation.
589601

590602
**Additional Resources:**
591-
- [KServe Chart README](https://github.com/opendatahub-io/rhaii-on-xks/blob/main/charts/kserve/README.md) - KServe Helm chart details, PKI prerequisites, and OCI registry install
592-
- [Preflight Validation](https://github.com/opendatahub-io/rhaii-on-xks/blob/main/validation/README.md) - Cluster readiness and post-deployment validation checks
593-
- [Monitoring Setup Guide](../monitoring-stack/) - Optional Prometheus/Grafana configuration for dashboards and autoscaling
594-
- [KServe LLMInferenceService Samples](https://github.com/red-hat-data-services/kserve/tree/rhoai-3.4/docs/samples/llmisvc)
603+
604+
* [KServe Chart README](https://github.com/opendatahub-io/rhaii-on-xks/blob/main/charts/kserve/README.md) - KServe Helm chart details, PKI prerequisites, and OCI registry install
605+
* [Preflight Validation](https://github.com/opendatahub-io/rhaii-on-xks/blob/main/validation/README.md) - Cluster readiness and post-deployment validation checks
606+
* [Monitoring Setup Guide](../monitoring-stack/) - Optional Prometheus/Grafana configuration for dashboards and autoscaling
607+
* [KServe LLMInferenceService Samples](https://github.com/red-hat-data-services/kserve/tree/rhoai-3.4/docs/samples/llmisvc)

docs/images/odh.png

9.81 KB
Loading

0 commit comments

Comments
 (0)