Consolidate deployment flow and documentation for KServe Helm chart integration. #7
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughRestructures KServe deployment to a Helm-driven install with cert-manager PKI prerequisites, adds Makefile targets for prerequisites and PKI, updates deploy/undeploy/status flows to include KServe and lws-operator, introduces a helmfile KServe release with presync CRD application, and updates docs, cleanup scripts, tests, and values for KServe 3.4.x. Changes
Sequence Diagram(s)sequenceDiagram
participant User as "User"
participant Make as "Makefile"
participant CertMgr as "cert-manager (PKI)"
participant Helmfile as "helmfile"
participant Kubectl as "kubectl (presync)"
participant Helm as "helm"
participant KServe as "KServe"
User->>Make: run `make deploy` / `make deploy-kserve`
Make->>CertMgr: apply PKI prereqs (ClusterIssuers, CA Certificate)
CertMgr-->>Make: PKI resources created
Make->>Helmfile: trigger KServe release install
Helmfile->>Kubectl: presync hook -> pull chart & apply CRDs
Kubectl-->>Helmfile: CRDs applied
Helmfile->>Helm: helm install/upgrade KServe
Helm->>KServe: deploy controller, webhooks, components
KServe-->>Helm: resources reconcile
Helm-->>Make: install complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@docs/deploying-llm-d-on-managed-kubernetes.md`:
- Line 547: Update the stale section reference in the sentence "3. Add matching
tolerations to the LLMInferenceService spec (see Section 6.3)" to point to the
correct section; replace "Section 6.3" with "Section 5.3" (the "Deploy the
LLMInferenceService" tolerations example) so the link/reference accurately
directs readers to the tolerations example.
- Around line 236-238: Remove the blank line separating the two blockquotes so
they become a single blockquote: merge the paragraph starting with "**TLS
Certificates:**" and the following "**Note:**" into the same blockquote (keep
the "**TLS Certificates:**" paragraph then add the "**Note:**" paragraph
directly beneath it with the same ">" prefix), or alternatively replace the
blank line with a non-blockquote separator; this will eliminate the MD028
markdownlint warning.
In `@test/conformance/verify-llm-d-deployment.sh`:
- Line 706: Update the stale KServe docs URL in the script's help output so both
occurrences match the new red-hat-data-services URL: locate the echo/help text
lines that print "KServe docs:" (the line that currently prints the old
opendatahub-io/kserve/tree/release-v0.15 URL) and replace that URL with
https://github.com/red-hat-data-services/kserve/tree/rhoai-3.4 so it matches the
other echo at the top.
🧹 Nitpick comments (3)
Makefile (1)
45-46:deploy-alltriggersclear-cachethree times redundantly.Each of
deploy-cert-manager,deploy-istio, anddeploy-lwsdepends onclear-cache, so it runs three times duringdeploy-all. Consider addingclear-cacheas the first dependency ofdeploy-alland removing it from the sub-targets, or accepting the minor overhead.helmfile.yaml.gotmpl (2)
44-64: Presync CRD installation pattern is sound.Using a presync hook to apply CRDs via
kubectl apply --server-sidecorrectly avoids Helm's 1MB secret limit. ThedisableValidation: trueis necessary since CRDs are managed out-of-band.One minor robustness note: if
kubectl applyfails (line 63),set -ecauses the script to exit beforerm -rf "$CHART_DIR"on line 64, leaving a temp directory in/tmp. Consider using a trap for cleanup.More robust temp dir cleanup
- | set -e CHART_DIR=$(mktemp -d) + trap 'rm -rf "$CHART_DIR"' EXIT helm pull oci://ghcr.io/opendatahub-io/kserve-rhaii-xks --version {{ .Values.kserveChartVersion | default "3.4.0-ea.1-dev-8a30e66" }} --untar --untardir "$CHART_DIR" kubectl apply -f "$CHART_DIR"/kserve-rhaii-xks/crds/ --server-side - rm -rf "$CHART_DIR"
8-8: Minor: comment alignment is off.Line 8 has an extra space before
# Deploy only Kservecompared to the other comment lines, and "Kserve" should be "KServe" for consistency.
0a0c30f to
ca82fd6
Compare
|
@coderabbitai review all |
|
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/cleanup.sh (1)
83-113:⚠️ Potential issue | 🟠 MajorMissing KServe cleanup in
cleanup.sh—make undeploywill leave orphaned resources.Now that KServe is deployed via helmfile (and
helmfile destroyis called on line 59), the CRDs installed by the presync hook (serving.kserve.io) won't be removed byhelmfile destroy(Helm doesn't remove CRDs on uninstall). Additionally, theopendatahubnamespace, KServe RBAC (ClusterRoles/ClusterRoleBindings), and PKI resources (ClusterIssuers, CA certificate) are not cleaned up.The
undeploy-kserveMakefile target handles all of this, butmake undeployonly callscleanup.sh -y— it never invokesundeploy-kserve. This means a fullmake undeploywill leave KServe CRDs, RBAC, ClusterIssuers, and theopendatahubnamespace behind.Consider adding KServe cleanup to
cleanup.sh(mirroring whatundeploy-kservedoes) or havingmake undeploycallundeploy-kservebeforecleanup.sh.Suggested additions to cleanup.sh
# Gateway API CRDs and Inference Extension CRDs (InferencePool, InferenceModel) # Matches both inference.networking.k8s.io (v1) and inference.networking.x-k8s.io (v1alpha2) echo "$CRDS" | grep -E "gateway\.networking\.k8s\.io|inference\.networking\.k8s\.io|inference\.networking\.x-k8s\.io" | while read -r crd; do kubectl delete "$crd" --ignore-not-found 2>/dev/null || true done +# KServe CRDs +echo "$CRDS" | grep -E "serving\.kserve\.io" | while read -r crd; do + kubectl delete "$crd" --ignore-not-found 2>/dev/null || true +done # Infrastructure stub CRD kubectl delete crd infrastructures.config.openshift.io --ignore-not-found 2>/dev/null || true # Clean up presync-created namespaces log "Cleaning up namespaces..." kubectl delete namespace cert-manager --ignore-not-found --wait=false 2>/dev/null || true kubectl delete namespace cert-manager-operator --ignore-not-found --wait=false 2>/dev/null || true kubectl delete namespace istio-system --ignore-not-found --wait=false 2>/dev/null || true kubectl delete namespace openshift-lws-operator --ignore-not-found --wait=false 2>/dev/null || true +kubectl delete namespace opendatahub --ignore-not-found --wait=false 2>/dev/null || trueBased on learnings: Inference Extension CRDs (InferencePool, InferenceModel) are installed by KServe and should be removed when undeploying KServe to avoid orphaned CRDs — the same principle applies to all KServe-owned CRDs during full cleanup.
🤖 Fix all issues with AI agents
In `@docs/deploying-llm-d-on-managed-kubernetes.md`:
- Around line 124-150: The referenced section number is incorrect: locate the
sentence "See Section 7.4 for full post-deployment validation." in the "1.5
Preflight Validation (Recommended)" block and update it to "See Section 6.4 for
full post-deployment validation." so the cross-reference points to the actual
"Run Preflight Validation" post-deployment section.
In `@Makefile`:
- Around line 77-89: The Makefile's undeploy target never runs the thorough
KServe cleanup defined in the undeploy-kserve target, so update the Makefile so
undeploy invokes undeploy-kserve before running cleanup.sh; specifically, modify
the undeploy target to either depend on undeploy-kserve (make undeploy:
undeploy-kserve ...) or explicitly call the undeploy-kserve target prior to
executing cleanup.sh, ensuring the symbols undeploy, undeploy-kserve, and
cleanup.sh are used so Helm/CRD/RBAC/PKI/namespace cleanup runs as part of make
undeploy.
🧹 Nitpick comments (2)
helmfile.yaml.gotmpl (2)
44-64: Chart URL and version default are duplicated between the release definition and presync hook.The OCI chart URL (
oci://ghcr.io/opendatahub-io/kserve-rhaii-xks) and version template expression appear in both the release (lines 46, 48) and the presync hook script (line 62). If the registry or chart name changes, both locations must be updated in sync.Consider extracting the chart ref into a value or a Go template variable to keep it DRY:
Suggested approach
releases: - name: kserve-rhaii-xks - chart: oci://ghcr.io/opendatahub-io/kserve-rhaii-xks + chart: oci://ghcr.io/opendatahub-io/kserve-rhaii-xks {{- /* also used in presync hook below */ -}} # Use dev variant until official builds are released to registry.redhat.io version: {{ .Values.kserveChartVersion | default "3.4.0-ea.1-dev-8a30e66" }}Alternatively, if helmfile Go templates support variable assignment at the top level, you could define the chart and version once and reference them in both places.
59-64: Temp directory not cleaned up on failure.With
set -e, ifhelm pullorkubectl applyfails, the script exits immediately andrm -rf "$CHART_DIR"is never reached, leaving a stale temp directory.Suggested fix — add a trap
- | set -e CHART_DIR=$(mktemp -d) + trap 'rm -rf "$CHART_DIR"' EXIT helm pull oci://ghcr.io/opendatahub-io/kserve-rhaii-xks --version {{ .Values.kserveChartVersion | default "3.4.0-ea.1-dev-8a30e66" }} --untar --untardir "$CHART_DIR" kubectl apply -f "$CHART_DIR"/kserve-rhaii-xks/crds/ --server-side - rm -rf "$CHART_DIR"
ca82fd6 to
82fe9fc
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@Makefile`:
- Around line 68-71: The deploy-kserve Makefile target currently depends on
check-kubeconfig and deploy-cert-manager-pki but not on cert-manager itself,
causing kubectl apply -f ./charts/kserve/pki-prereq.yaml to fail when
cert-manager CRDs are missing; update the deploy-kserve target to either add
deploy-cert-manager as a prerequisite (so the target line includes
deploy-cert-manager before deploy-cert-manager-pki) or add a guard/validation
step in the target that checks for the cert-manager CRD (e.g., detect
cert-manager.io/v1 CRD existence) and prints a clear error/exit if missing,
referencing the Makefile target name deploy-kserve and the existing
deploy-cert-manager-pki step to locate where to change.
🧹 Nitpick comments (1)
Makefile (1)
39-46:deployanddeploy-alluse different patterns —deploy-allclears cache 3 times.
deploy(line 39) callsclear-cacheonce then runs sequentialhelmfile applycommands.deploy-all(line 45) invokes individual targets that each independently callclear-cache. This is a minor inconsistency — not harmful but wastes time on repeated cache clears.Consider having
deploy-alldepend onclear-cacheonce, and having the individual targets skip it when called as sub-targets. Alternatively, this can be left as-is since it's a development convenience concern.
cd02f55 to
271a32a
Compare
|
@coderabbitai review all |
|
✅ Actions performedReview triggered.
|
271a32a to
fb062e9
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/deploying-llm-d-on-managed-kubernetes.md (1)
209-234:⚠️ Potential issue | 🟡 MinorExpected
make statusoutput is missing the new KServe sections.The Makefile
statustarget (lines 109–114) now printskserve:andkserve config:sections, but the expected output example here stops afterlws-operatorand=== API Versions ===. Users runningmake deploy-allwill see additional KServe output not shown in this example.Consider adding the KServe sections to the expected output:
📝 Suggested addition after the lws-operator block
lws-operator: NAME READY STATUS RESTARTS AGE lws-controller-manager-xxxxxxxxx-xxxxx 1/1 Running 0 5m +kserve: +NAME READY STATUS RESTARTS AGE +kserve-controller-manager-xxxxxxxxx-xxxxx 1/1 Running 0 5m + +kserve config: +NAME AGE +inferencing 5m + === API Versions ===
🤖 Fix all issues with AI agents
In `@helmfile.yaml.gotmpl`:
- Around line 44-65: The chart URL and version are duplicated between the
kserve-rhaii-xks release definition and its presync hook causing potential skew;
refactor by introducing template variables (e.g., define $kserveChart =
"oci://ghcr.io/opendatahub-io/kserve-rhaii-xks" and $kserveVersion =
.Values.kserveChartVersion | default "3.4.0-ea.1-dev-8a30e66") at the top of the
template and replace the literal chart and version occurrences in the release
block (chart: ...) and in the presync helm pull command so both the chart
reference and the helm pull use the same named variables; update references in
the release named kserve-rhaii-xks and inside the presync hook block to use
those variables.
🧹 Nitpick comments (1)
Makefile (1)
57-68: Minor:deploy-opendatahub-prerequisitessilently swallows pull-secret copy failures.Lines 60–62 copy the pull secret from
istio-systemto the KServe namespace but suppress all errors. Ifistio-systemhasn't been deployed yet (e.g., when runningmake deploy-kservestandalone before Istio), this silently succeeds with no secret copied, and later steps that need registry auth may fail with a confusingErrImagePull.Consider emitting a warning when the secret copy is skipped:
♻️ Suggested improvement
-kubectl get secret redhat-pull-secret -n istio-system -o yaml 2>/dev/null | \ sed 's/namespace: istio-system/namespace: $(KSERVE_NAMESPACE)/' | \ - kubectl apply -f - 2>/dev/null || true + kubectl apply -f - 2>/dev/null || echo "WARNING: redhat-pull-secret not found in istio-system; registry auth may be missing in $(KSERVE_NAMESPACE)"
fb062e9 to
628faaa
Compare
Signed-off-by: Aneesh Puttur <aneeshputtur@gmail.com>
628faaa to
f40182f
Compare
…chart-validation Consolidate deployment flow and documentation for KServe Helm chart integration.
Consolidate deployment flow and documentation for KServe Helm chart integration.
Deployment changes:
ghcr.io/opendatahub-io/kserve-rhaii-xks) to helmfile with configurable version invalues.yamlmake deploy-allnow deploys all components (cert-manager + Istio + LWS + KServe) in one stepmake deployupdated to include LWSpki-prereq.yamlfor cert-manager PKI instead of fetching from GitHub at deploy timeundeploy-kservetarget with full cleanup (Helm release, CRDs, RBAC, ClusterIssuers, CA certificate, namespace)make statusDocumentation changes:
opendatahub-io/kservetored-hat-data-services/kservebranchrhoai-3.4Cleanup fixes:
leaderworkerset.x-k8s.io) and istioCSR (istiocsrs.operator.openshift.io) CRD cleanup tocleanup.shIncludes changes from #5 (KServe Helm chart via helmfile + OCI registry).
How Has This Been Tested?
make deploy-allon AKS cluster — cert-manager, Istio, LWS deploy successfullyghcr.io/opendatahub-io/kserve-rhaii-xks(pending)make undeploy-kserveandmake undeploy— verified CRDs, RBAC, namespaces cleaned upmake statusshows KServe controller pod and LLMInferenceServiceConfigMerge criteria:
Summary by CodeRabbit
New Features
Documentation
Chores
Tests