-
Notifications
You must be signed in to change notification settings - Fork 638
feat: add pre-deployment checks #3573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add pre-deployment checks #3573
Conversation
WalkthroughIntroduces a Bash pre-deployment check script for Kubernetes-based Dynamo deployments and a README documenting its usage. The script validates kubectl connectivity, default StorageClass presence, and GPU-enabled nodes, aggregates results, prints a summary, and sets exit status based on failures. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User
participant S as pre-deployment-check.sh
participant K as kubectl
participant C as Kubernetes Cluster
U->>S: Run script
activate S
Note over S: Initialize CHECK_ORDER and CHECK_RESULTS
rect rgba(200,235,255,0.3)
S->>K: kubectl version / cluster-info
K->>C: Connect
C-->>K: Response / Error
K-->>S: Connectivity result
Note over S: Record PASS/FAIL (kubectl connectivity)
end
rect rgba(220,255,220,0.3)
S->>K: kubectl get storageclass -o json
K->>C: Query StorageClasses
C-->>K: SC list
K-->>S: SC data
Note over S: Check default SC presence / multiple defaults<br/>Provide commands if missing/multiple
end
rect rgba(255,235,200,0.3)
S->>K: kubectl get nodes -L nvidia.com/gpu.present
K->>C: Query nodes/labels
C-->>K: Node list
K-->>S: Labeled GPU node count
Note over S: Record GPU availability
end
S-->>U: Print per-check results and summary
S-->>U: Exit 0 if all PASS, else non-zero
deactivate S
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
deploy/cloud/pre-deployment/README.md (1)
33-103
: Add language identifiers to fenced blocks.markdownlint is flagging the output/tips code fences because they lack language specifiers (MD040). Please tag them with something like
text
orconsole
so the doc passes lint.Also applies to: 129-137
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
deploy/cloud/pre-deployment/README.md
(1 hunks)deploy/cloud/pre-deployment/pre-deployment-check.sh
(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Copyright Checks
deploy/cloud/pre-deployment/pre-deployment-check.sh
[error] 1-1: Copyright check failed: Invalid/Missing Header detected in deploy/cloud/pre-deployment/pre-deployment-check.sh.
🪛 markdownlint-cli2 (0.18.1)
deploy/cloud/pre-deployment/README.md
33-33: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
76-76: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
129-129: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Shellcheck (0.11.0)
deploy/cloud/pre-deployment/pre-deployment-check.sh
[warning] 97-97: provisioner appears unused. Verify use (or export if used externally).
(SC2034)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
/ok to test 9f5edfc |
/ok to test 969fec8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, It would be interesting to run this as a pre-install hook for dynamo cloud. That way customers will have this in the critical path.
/ok to test 95fa843 |
/ok to test 95fa843 |
} | ||
|
||
# Global variables to track check results | ||
declare -A CHECK_RESULTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried running on my local mac against the Nebius cluster. I immediately hit:
./deploy/cloud/pre-deployment/pre-deployment-check.sh: line 180: declare: -A: invalid option
declare: usage: declare [-afFirtx] [-p] [name[=value] ...]
Cursor tells me:
Ah! The issue is that macOS ships with bash 3.2 by default, which doesn't support associative arrays (declare -A). Associative arrays were introduced in bash 4.0+.
Things work if I just a parallel index array:
declare -a
with refactoring where CHECK_RESULTS
is used.
Overview:
This directory contains a pre-deployment check script that verifies if Kubernetes cluster meets the requirements for deploying Dynamo.
includes #3584, #3574
Sample Successful check
Sample Failed check
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)