Skip to content

Consolidate deployment flow and documentation for KServe Helm chart integration. #7

Merged
anishasthana merged 1 commit intoopendatahub-io:mainfrom
aneeshkp:update-docs-kserve-chart-validation
Feb 13, 2026
Merged

Consolidate deployment flow and documentation for KServe Helm chart integration. #7
anishasthana merged 1 commit intoopendatahub-io:mainfrom
aneeshkp:update-docs-kserve-chart-validation

Conversation

@aneeshkp
Copy link
Copy Markdown
Contributor

@aneeshkp aneeshkp commented Feb 12, 2026

Consolidate deployment flow and documentation for KServe Helm chart integration.

Deployment changes:

  • Add KServe OCI chart (ghcr.io/opendatahub-io/kserve-rhaii-xks) to helmfile with configurable version in values.yaml
  • make deploy-all now deploys all components (cert-manager + Istio + LWS + KServe) in one step
  • make deploy updated to include LWS
  • Use local pki-prereq.yaml for cert-manager PKI instead of fetching from GitHub at deploy time
  • Add undeploy-kserve target with full cleanup (Helm release, CRDs, RBAC, ClusterIssuers, CA certificate, namespace)
  • Add KServe pod status and LLMInferenceServiceConfig to make status

Documentation changes:

  • Trim README to minimal Quick Start, remove content duplicated in deploying doc
  • Merge deploying doc Sections 3+4 into single "Deploying All Components" section, renumber remaining sections
  • Add preflight validation sections (1.5 and 6.4) with pre/post deployment timing table
  • Add TLS certificate note for customers (self-signed default, corporate CA guidance)
  • Update KServe refs from opendatahub-io/kserve to red-hat-data-services/kserve branch rhoai-3.4

Cleanup fixes:

  • Add LWS (leaderworkerset.x-k8s.io) and istioCSR (istiocsrs.operator.openshift.io) CRD cleanup to cleanup.sh

Includes changes from #5 (KServe Helm chart via helmfile + OCI registry).

How Has This Been Tested?

  • Ran make deploy-all on AKS cluster — cert-manager, Istio, LWS deploy successfully
  • KServe OCI chart pull requires the chart to be published to ghcr.io/opendatahub-io/kserve-rhaii-xks (pending)
  • Ran make undeploy-kserve and make undeploy — verified CRDs, RBAC, namespaces cleaned up
  • Verified make status shows KServe controller pod and LLMInferenceServiceConfig

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • New Features

    • KServe added (v3.4.0-ea.1) with Helm-based install, PKI prerequisites, and new multi-step deploy/undeploy targets including LWS.
  • Documentation

    • Deployment docs consolidated and streamlined; added preflight validation, simplified deploy commands, and updated KServe version notes and references.
  • Chores

    • Undeploy/cleanup adjusted for Helm-managed resources; status now reports KServe state; added KServe chart version config and enhanced namespace/cleanup handling.
  • Tests

    • Verification output links updated to reflect new KServe docs/version.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 12, 2026

Warning

Rate limit exceeded

@aneeshkp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 1 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Restructures KServe deployment to a Helm-driven install with cert-manager PKI prerequisites, adds Makefile targets for prerequisites and PKI, updates deploy/undeploy/status flows to include KServe and lws-operator, introduces a helmfile KServe release with presync CRD application, and updates docs, cleanup scripts, tests, and values for KServe 3.4.x.

Changes

Cohort / File(s) Summary
Makefile / Orchestration
Makefile
Added deploy-opendatahub-prerequisites, deploy-cert-manager-pki; deploy now includes lws-operator; deploy-all invokes per-component targets; deploy-kserve depends on PKI prereqs and uses Helm; undeploy/undeploy-kserve updated for Helm removal; status extended for KServe.
Helm / Chart Config
helmfile.yaml.gotmpl, values.yaml
Added kserve-rhaii-xks release (OCI chart) with presync hook to pull & apply CRDs; introduced kserveChartVersion config key.
PKI Prerequisites Manifest
charts/kserve/pki-prereq.yaml
New cert-manager resources: opendatahub-selfsigned-issuer (ClusterIssuer), opendatahub-ca-issuer (ClusterIssuer referencing CA secret), and opendatahub-ca Certificate (namespace cert-manager, isCA: true, 10-year RSA4096).
Cleanup Script
scripts/cleanup.sh
Refined CRD deletion: separated LWS CRD filtering (leaderworkerset.x-k8s.io), explicit operator CRD removals, added KServe CRD filtering (serving.kserve.io), retained cert-manager/gateway patterns, and added deletion of opendatahub namespace.
Docs & Deployment Guide
README.md, docs/deploying-llm-d-on-managed-kubernetes.md
Bumped KServe references to 3.4.x, added preflight validation, consolidated deployment steps to reference full guides, updated PKI/TLS guidance and sample/URL references to rhoai-3.4.
Tests / Verification
test/conformance/verify-llm-d-deployment.sh
Updated sample doc URLs to rhoai-3.4 chart/doc paths; no logic changes.

Sequence Diagram(s)

sequenceDiagram
    participant User as "User"
    participant Make as "Makefile"
    participant CertMgr as "cert-manager (PKI)"
    participant Helmfile as "helmfile"
    participant Kubectl as "kubectl (presync)"
    participant Helm as "helm"
    participant KServe as "KServe"

    User->>Make: run `make deploy` / `make deploy-kserve`
    Make->>CertMgr: apply PKI prereqs (ClusterIssuers, CA Certificate)
    CertMgr-->>Make: PKI resources created

    Make->>Helmfile: trigger KServe release install
    Helmfile->>Kubectl: presync hook -> pull chart & apply CRDs
    Kubectl-->>Helmfile: CRDs applied

    Helmfile->>Helm: helm install/upgrade KServe
    Helm->>KServe: deploy controller, webhooks, components
    KServe-->>Helm: resources reconcile
    Helm-->>Make: install complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Add Kserve chart #4 — Modifies/adds the charts/kserve Helm chart and related chart artifacts (CRDs, webhooks, chart metadata), closely related to the KServe chart and CRD handling here.

Suggested reviewers

  • anishasthana

Poem

🐰 I hop through charts and certs with glee,

Helm sings softly, CRDs dance free,
A CA planted, PKI takes root,
KServe wakes up — engines toot,
I nibble a carrot and sip my tea. 🥕

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: consolidating deployment flow and integrating a KServe Helm chart, which aligns with the core modifications across Makefile, helmfile, and documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@docs/deploying-llm-d-on-managed-kubernetes.md`:
- Line 547: Update the stale section reference in the sentence "3. Add matching
tolerations to the LLMInferenceService spec (see Section 6.3)" to point to the
correct section; replace "Section 6.3" with "Section 5.3" (the "Deploy the
LLMInferenceService" tolerations example) so the link/reference accurately
directs readers to the tolerations example.
- Around line 236-238: Remove the blank line separating the two blockquotes so
they become a single blockquote: merge the paragraph starting with "**TLS
Certificates:**" and the following "**Note:**" into the same blockquote (keep
the "**TLS Certificates:**" paragraph then add the "**Note:**" paragraph
directly beneath it with the same ">" prefix), or alternatively replace the
blank line with a non-blockquote separator; this will eliminate the MD028
markdownlint warning.

In `@test/conformance/verify-llm-d-deployment.sh`:
- Line 706: Update the stale KServe docs URL in the script's help output so both
occurrences match the new red-hat-data-services URL: locate the echo/help text
lines that print "KServe docs:" (the line that currently prints the old
opendatahub-io/kserve/tree/release-v0.15 URL) and replace that URL with
https://github.com/red-hat-data-services/kserve/tree/rhoai-3.4 so it matches the
other echo at the top.
🧹 Nitpick comments (3)
Makefile (1)

45-46: deploy-all triggers clear-cache three times redundantly.

Each of deploy-cert-manager, deploy-istio, and deploy-lws depends on clear-cache, so it runs three times during deploy-all. Consider adding clear-cache as the first dependency of deploy-all and removing it from the sub-targets, or accepting the minor overhead.

helmfile.yaml.gotmpl (2)

44-64: Presync CRD installation pattern is sound.

Using a presync hook to apply CRDs via kubectl apply --server-side correctly avoids Helm's 1MB secret limit. The disableValidation: true is necessary since CRDs are managed out-of-band.

One minor robustness note: if kubectl apply fails (line 63), set -e causes the script to exit before rm -rf "$CHART_DIR" on line 64, leaving a temp directory in /tmp. Consider using a trap for cleanup.

More robust temp dir cleanup
          - |
            set -e
            CHART_DIR=$(mktemp -d)
+           trap 'rm -rf "$CHART_DIR"' EXIT
            helm pull oci://ghcr.io/opendatahub-io/kserve-rhaii-xks --version {{ .Values.kserveChartVersion | default "3.4.0-ea.1-dev-8a30e66" }} --untar --untardir "$CHART_DIR"
            kubectl apply -f "$CHART_DIR"/kserve-rhaii-xks/crds/ --server-side
-           rm -rf "$CHART_DIR"

8-8: Minor: comment alignment is off.

Line 8 has an extra space before # Deploy only Kserve compared to the other comment lines, and "Kserve" should be "KServe" for consistency.

@aneeshkp aneeshkp force-pushed the update-docs-kserve-chart-validation branch from 0a0c30f to ca82fd6 Compare February 12, 2026 21:28
@aneeshkp
Copy link
Copy Markdown
Contributor Author

@coderabbitai review all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 12, 2026

@aneeshkp: I'll review all the changes in this PR. This includes the deployment flow consolidation, KServe Helm chart integration, documentation updates, and cleanup improvements across all 8 modified files.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/cleanup.sh (1)

83-113: ⚠️ Potential issue | 🟠 Major

Missing KServe cleanup in cleanup.shmake undeploy will leave orphaned resources.

Now that KServe is deployed via helmfile (and helmfile destroy is called on line 59), the CRDs installed by the presync hook (serving.kserve.io) won't be removed by helmfile destroy (Helm doesn't remove CRDs on uninstall). Additionally, the opendatahub namespace, KServe RBAC (ClusterRoles/ClusterRoleBindings), and PKI resources (ClusterIssuers, CA certificate) are not cleaned up.

The undeploy-kserve Makefile target handles all of this, but make undeploy only calls cleanup.sh -y — it never invokes undeploy-kserve. This means a full make undeploy will leave KServe CRDs, RBAC, ClusterIssuers, and the opendatahub namespace behind.

Consider adding KServe cleanup to cleanup.sh (mirroring what undeploy-kserve does) or having make undeploy call undeploy-kserve before cleanup.sh.

Suggested additions to cleanup.sh
 # Gateway API CRDs and Inference Extension CRDs (InferencePool, InferenceModel)
 # Matches both inference.networking.k8s.io (v1) and inference.networking.x-k8s.io (v1alpha2)
 echo "$CRDS" | grep -E "gateway\.networking\.k8s\.io|inference\.networking\.k8s\.io|inference\.networking\.x-k8s\.io" | while read -r crd; do
     kubectl delete "$crd" --ignore-not-found 2>/dev/null || true
 done
+# KServe CRDs
+echo "$CRDS" | grep -E "serving\.kserve\.io" | while read -r crd; do
+    kubectl delete "$crd" --ignore-not-found 2>/dev/null || true
+done
 # Infrastructure stub CRD
 kubectl delete crd infrastructures.config.openshift.io --ignore-not-found 2>/dev/null || true
 
 # Clean up presync-created namespaces
 log "Cleaning up namespaces..."
 kubectl delete namespace cert-manager --ignore-not-found --wait=false 2>/dev/null || true
 kubectl delete namespace cert-manager-operator --ignore-not-found --wait=false 2>/dev/null || true
 kubectl delete namespace istio-system --ignore-not-found --wait=false 2>/dev/null || true
 kubectl delete namespace openshift-lws-operator --ignore-not-found --wait=false 2>/dev/null || true
+kubectl delete namespace opendatahub --ignore-not-found --wait=false 2>/dev/null || true

Based on learnings: Inference Extension CRDs (InferencePool, InferenceModel) are installed by KServe and should be removed when undeploying KServe to avoid orphaned CRDs — the same principle applies to all KServe-owned CRDs during full cleanup.

🤖 Fix all issues with AI agents
In `@docs/deploying-llm-d-on-managed-kubernetes.md`:
- Around line 124-150: The referenced section number is incorrect: locate the
sentence "See Section 7.4 for full post-deployment validation." in the "1.5
Preflight Validation (Recommended)" block and update it to "See Section 6.4 for
full post-deployment validation." so the cross-reference points to the actual
"Run Preflight Validation" post-deployment section.

In `@Makefile`:
- Around line 77-89: The Makefile's undeploy target never runs the thorough
KServe cleanup defined in the undeploy-kserve target, so update the Makefile so
undeploy invokes undeploy-kserve before running cleanup.sh; specifically, modify
the undeploy target to either depend on undeploy-kserve (make undeploy:
undeploy-kserve ...) or explicitly call the undeploy-kserve target prior to
executing cleanup.sh, ensuring the symbols undeploy, undeploy-kserve, and
cleanup.sh are used so Helm/CRD/RBAC/PKI/namespace cleanup runs as part of make
undeploy.
🧹 Nitpick comments (2)
helmfile.yaml.gotmpl (2)

44-64: Chart URL and version default are duplicated between the release definition and presync hook.

The OCI chart URL (oci://ghcr.io/opendatahub-io/kserve-rhaii-xks) and version template expression appear in both the release (lines 46, 48) and the presync hook script (line 62). If the registry or chart name changes, both locations must be updated in sync.

Consider extracting the chart ref into a value or a Go template variable to keep it DRY:

Suggested approach
 releases:
   - name: kserve-rhaii-xks
-    chart: oci://ghcr.io/opendatahub-io/kserve-rhaii-xks
+    chart: oci://ghcr.io/opendatahub-io/kserve-rhaii-xks  {{- /* also used in presync hook below */ -}}
     # Use dev variant until official builds are released to registry.redhat.io
     version: {{ .Values.kserveChartVersion | default "3.4.0-ea.1-dev-8a30e66" }}

Alternatively, if helmfile Go templates support variable assignment at the top level, you could define the chart and version once and reference them in both places.


59-64: Temp directory not cleaned up on failure.

With set -e, if helm pull or kubectl apply fails, the script exits immediately and rm -rf "$CHART_DIR" is never reached, leaving a stale temp directory.

Suggested fix — add a trap
          - |
            set -e
            CHART_DIR=$(mktemp -d)
+           trap 'rm -rf "$CHART_DIR"' EXIT
            helm pull oci://ghcr.io/opendatahub-io/kserve-rhaii-xks --version {{ .Values.kserveChartVersion | default "3.4.0-ea.1-dev-8a30e66" }} --untar --untardir "$CHART_DIR"
            kubectl apply -f "$CHART_DIR"/kserve-rhaii-xks/crds/ --server-side
-           rm -rf "$CHART_DIR"

@aneeshkp aneeshkp force-pushed the update-docs-kserve-chart-validation branch from ca82fd6 to 82fe9fc Compare February 12, 2026 21:52
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@Makefile`:
- Around line 68-71: The deploy-kserve Makefile target currently depends on
check-kubeconfig and deploy-cert-manager-pki but not on cert-manager itself,
causing kubectl apply -f ./charts/kserve/pki-prereq.yaml to fail when
cert-manager CRDs are missing; update the deploy-kserve target to either add
deploy-cert-manager as a prerequisite (so the target line includes
deploy-cert-manager before deploy-cert-manager-pki) or add a guard/validation
step in the target that checks for the cert-manager CRD (e.g., detect
cert-manager.io/v1 CRD existence) and prints a clear error/exit if missing,
referencing the Makefile target name deploy-kserve and the existing
deploy-cert-manager-pki step to locate where to change.
🧹 Nitpick comments (1)
Makefile (1)

39-46: deploy and deploy-all use different patterns — deploy-all clears cache 3 times.

deploy (line 39) calls clear-cache once then runs sequential helmfile apply commands. deploy-all (line 45) invokes individual targets that each independently call clear-cache. This is a minor inconsistency — not harmful but wastes time on repeated cache clears.

Consider having deploy-all depend on clear-cache once, and having the individual targets skip it when called as sub-targets. Alternatively, this can be left as-is since it's a development convenience concern.

@aneeshkp aneeshkp force-pushed the update-docs-kserve-chart-validation branch 2 times, most recently from cd02f55 to 271a32a Compare February 12, 2026 21:58
@aneeshkp
Copy link
Copy Markdown
Contributor Author

@coderabbitai review all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 12, 2026

@aneeshkp: I'll review all the changes in this PR now, including the deployment flow consolidation, KServe Helm chart integration via OCI registry, documentation updates, and cleanup improvements across all 8 modified files.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@aneeshkp aneeshkp force-pushed the update-docs-kserve-chart-validation branch from 271a32a to fb062e9 Compare February 13, 2026 18:00
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/deploying-llm-d-on-managed-kubernetes.md (1)

209-234: ⚠️ Potential issue | 🟡 Minor

Expected make status output is missing the new KServe sections.

The Makefile status target (lines 109–114) now prints kserve: and kserve config: sections, but the expected output example here stops after lws-operator and === API Versions ===. Users running make deploy-all will see additional KServe output not shown in this example.

Consider adding the KServe sections to the expected output:

📝 Suggested addition after the lws-operator block
 lws-operator:
 NAME                                       READY   STATUS    RESTARTS   AGE
 lws-controller-manager-xxxxxxxxx-xxxxx     1/1     Running   0          5m
 
+kserve:
+NAME                                       READY   STATUS    RESTARTS   AGE
+kserve-controller-manager-xxxxxxxxx-xxxxx  1/1     Running   0          5m
+
+kserve config:
+NAME                      AGE
+inferencing               5m
+
 === API Versions ===
🤖 Fix all issues with AI agents
In `@helmfile.yaml.gotmpl`:
- Around line 44-65: The chart URL and version are duplicated between the
kserve-rhaii-xks release definition and its presync hook causing potential skew;
refactor by introducing template variables (e.g., define $kserveChart =
"oci://ghcr.io/opendatahub-io/kserve-rhaii-xks" and $kserveVersion =
.Values.kserveChartVersion | default "3.4.0-ea.1-dev-8a30e66") at the top of the
template and replace the literal chart and version occurrences in the release
block (chart: ...) and in the presync helm pull command so both the chart
reference and the helm pull use the same named variables; update references in
the release named kserve-rhaii-xks and inside the presync hook block to use
those variables.
🧹 Nitpick comments (1)
Makefile (1)

57-68: Minor: deploy-opendatahub-prerequisites silently swallows pull-secret copy failures.

Lines 60–62 copy the pull secret from istio-system to the KServe namespace but suppress all errors. If istio-system hasn't been deployed yet (e.g., when running make deploy-kserve standalone before Istio), this silently succeeds with no secret copied, and later steps that need registry auth may fail with a confusing ErrImagePull.

Consider emitting a warning when the secret copy is skipped:

♻️ Suggested improvement
 	-kubectl get secret redhat-pull-secret -n istio-system -o yaml 2>/dev/null | \
 		sed 's/namespace: istio-system/namespace: $(KSERVE_NAMESPACE)/' | \
-		kubectl apply -f - 2>/dev/null || true
+		kubectl apply -f - 2>/dev/null || echo "WARNING: redhat-pull-secret not found in istio-system; registry auth may be missing in $(KSERVE_NAMESPACE)"

@aneeshkp aneeshkp force-pushed the update-docs-kserve-chart-validation branch from fb062e9 to 628faaa Compare February 13, 2026 18:11
Signed-off-by: Aneesh Puttur <aneeshputtur@gmail.com>
@aneeshkp aneeshkp force-pushed the update-docs-kserve-chart-validation branch from 628faaa to f40182f Compare February 13, 2026 18:19
Copy link
Copy Markdown
Member

@anishasthana anishasthana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@anishasthana anishasthana merged commit 3a219b2 into opendatahub-io:main Feb 13, 2026
2 checks passed
@pierDipi pierDipi mentioned this pull request Feb 13, 2026
3 tasks
mpaulgreen pushed a commit to mpaulgreen/rhaii-on-xks that referenced this pull request Feb 19, 2026
…chart-validation

 Consolidate deployment flow and documentation for KServe Helm chart integration.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants