-
Notifications
You must be signed in to change notification settings - Fork 638
feat: Make the GAIE blackbox deployment a backup option #3591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Anna Tchernych <[email protected]>
WalkthroughDocumentation and Helm chart updates introduce a configurable Dynamo EPP integration. The README is rewritten to detail two routing modes and updated deployment steps. Helm templates and values switch from eppAware to epp.useDynamo with epp.dynamo settings, configurable configFile, and conditional env/volumes based on Dynamo usage. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User
participant H as Helm
participant K as Kubernetes
participant G as Inference Gateway
participant E as EPP Container
U->>H: helm upgrade --install (values incl. epp.useDynamo)
H->>K: Apply manifests (CRDs, Deployments, ConfigMaps)
K-->>G: Start Gateway Pod
K-->>E: Start EPP Pod (image: extension.image)
alt epp.useDynamo = true
Note over E,G: Dynamo KV-aware routing enabled
K-->>E: Mount configFile, set DYN_NAMESPACE, DYNAMO_COMPONENT, DYNAMO_KV_BLOCK_SIZE
G->>E: Route requests via Dynamo-aware path
E->>E: KV routing per config
else epp.useDynamo = false
Note over E,G: Black-box routing
G->>E: Route requests without KV-aware features
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
deploy/inference-gateway/README.md (1)
108-116
: Minor typos
- “the the HuggingFace token” → “the HuggingFace token”.
- “for you model” → “for your model”.
🧹 Nitpick comments (8)
deploy/inference-gateway/helm/dynamo-gaie/values.yaml (2)
51-54
: Potential footgun: useDynamo defaults to true but image is standard EPPIf users don’t override extension.image, useDynamo=true will run the standard EPP image (no Dynamo plugin). Consider documenting prominently that extension.image must be set to the Dynamo EPP image when useDynamo=true, or default useDynamo to false here.
65-73
: Clarify mandatory fields when useDynamo is trueAdd a comment that epp.dynamo.namespace and epp.dynamo.kvBlockSize are required when useDynamo=true, mirroring README guidance.
deploy/inference-gateway/helm/dynamo-gaie/templates/dynamo-epp.yaml (4)
68-75
: Make kvBlockSize required when useDynamo=truePer README, DYNAMO_KV_BLOCK_SIZE is mandatory. Use Helm required() to avoid a silent default.
Apply this diff:
-{{- $kv := default "16" .Values.epp.dynamo.kvBlockSize -}} +{{- $kv := required "set epp.dynamo.kvBlockSize via values" .Values.epp.dynamo.kvBlockSize -}}
28-33
: Roll pods on ConfigMap changesAdd a checksum/annotation for the EPP config to trigger rollout when the ConfigMap content changes.
Example insertion under spec.template.metadata:
labels: app: {{ .Values.model.shortName }}-epp + annotations: + checksum/epp-config: {{ include (print $.Template.BasePath "/epp-config-dynamo.yaml") . | sha256sum }}Adjust the include path to your actual ConfigMap template name.
42-66
: Validate image when useDynamo is enabledChart always uses .Values.extension.image. If useDynamo=true but image isn’t the Dynamo-enabled EPP, startup may fail. At minimum, add a values/README note; optionally gate a warning/fail in template.
101-118
: Add resource requests/limits and securityContextDefine requests/limits and a minimal securityContext (runAsNonRoot, drop ALL) for stability and security.
If desired, I can propose concrete values.
deploy/inference-gateway/README.md (2)
221-227
: Standard EPP image registry inconsistencyREADME uses us-central1-docker.pkg.dev/k8s-staging-images/... while values.yaml defaults to us-central1-docker.pkg.dev/k8s-artifacts-prod/images/....
Pick one (prefer “artifacts-prod”) to avoid confusion.
174-176
: Fix heading level/style (markdownlint: MD001, MD003)Decrement this heading to H4 (####) and use standard atx style (no closing hashes).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
deploy/inference-gateway/README.md
(11 hunks)deploy/inference-gateway/helm/dynamo-gaie/templates/dynamo-epp.yaml
(3 hunks)deploy/inference-gateway/helm/dynamo-gaie/values.yaml
(2 hunks)deploy/inference-gateway/values-dynamo-epp.yaml
(2 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
deploy/inference-gateway/README.md
174-174: Heading levels should only increment by one level at a time
Expected: h4; Actual: h5
(MD001, heading-increment)
174-174: Heading style
Expected: atx; Actual: atx_closed
(MD003, heading-style)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: trtllm (arm64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (6)
deploy/inference-gateway/values-dynamo-epp.yaml (2)
16-18
: Image name consistency with READMEREADME references nvcr.io/.../epp-inference-extension-dynamo:v0.6.0-1, but this file uses .../gaie-epp-dynamo:v0.6.0-1. Align them to avoid pull errors.
Would you like me to update either the README or this values file to a single, verified image name?
25-31
: Ensure kvBlockSize matches model block size"DYNAMO_KV_BLOCK_SIZE" must match the model’s block size; "16" is just an example. Set this to the actual value to prevent routing issues.
deploy/inference-gateway/README.md (3)
130-134
: Dynamo EPP image name mismatchHere: nvcr.io/.../epp-inference-extension-dynamo:v0.6.0-1; values-dynamo-epp.yaml uses .../gaie-epp-dynamo:v0.6.0-1. Standardize to one verified image.
Would you like me to update both spots to the correct, published image?
206-209
: build-epp-dynamo.sh is present and executable
36-41
: Referenced script exists: install_gaie_crd_kgateway.sh is present in deploy/inference-gateway.deploy/inference-gateway/helm/dynamo-gaie/templates/dynamo-epp.yaml (1)
119-127
: ConfigMap and key presence confirmed
A ConfigMap named{{ include "dynamo-gaie.fullname" . }}-epp-config
is defined intemplates/epp-configmap.yaml
with theepp-config-dynamo.yaml
key.
Signed-off-by: Anna Tchernych <[email protected]>
Overview:
DEP-505
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Documentation
Refactor