-
Notifications
You must be signed in to change notification settings - Fork 14
Add deploy_fma.sh and debug workflow for OCP E2E #357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 8 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
2871869
Add deploy_fma.sh and debug workflow for OCP E2E
diegocastanibm a33932a
refactor: Replace inline deployment steps with deploy_fma.sh
diegocastanibm a84b8d6
delete ci-e2e-openshift-debug.yaml
diegocastanibm b32f83b
Mike's comments. Pending create independent script for steps 8-14
diegocastanibm 6ecbe66
remove test steps
diegocastanibm c29c85e
Renamed FMA_RELEASE_NAME to FMA_CHART_INSTANCE_NAME
diegocastanibm db320fc
- Remove unused CONTROLLER_IMAGE variable (was defined and echoed but…
diegocastanibm 7ffc6a1
Mike's comments
diegocastanibm 306088a
Mike's comment 2
diegocastanibm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
MikeSpreitzer marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,187 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| # Usage: $0 | ||
| # Current working directory must be the root of the Git repository. | ||
| # | ||
| # Deploys the FMA controllers (dual-pods controller + launcher-populator) | ||
| # and waits for them to be available. | ||
| # | ||
| # Required environment variables: | ||
| # FMA_NAMESPACE - target Kubernetes namespace | ||
| # FMA_CHART_INSTANCE_NAME - Helm chart instance name | ||
| # CONTAINER_IMG_REG - container image registry/namespace | ||
| # (e.g. ghcr.io/llm-d-incubation/llm-d-fast-model-actuation) | ||
| # IMAGE_TAG - image tag for all components | ||
| # (e.g. ref-abcd1234) | ||
| # | ||
| # Optional environment variables: | ||
| # NODE_VIEW_CLUSTER_ROLE - ClusterRole granting node read access. | ||
| # If unset, the script creates one named | ||
| # "${FMA_CHART_INSTANCE_NAME}-node-view". | ||
| # If set to an existing ClusterRole name, it is | ||
| # used as-is (no creation). | ||
| # If set to "none", no ClusterRole is configured. | ||
MikeSpreitzer marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| # RUNTIME_CLASS_NAME - if set, adds runtimeClassName to GPU pod specs | ||
| # (e.g. "nvidia" when the GPU operator requires it) | ||
| # POLICIES_ENABLED - "true"/"false"; auto-detected if unset | ||
| # FMA_DEBUG - "true" to enable shell tracing (set -x) | ||
| # HELM_EXTRA_ARGS - additional Helm arguments appended to the | ||
| # `helm upgrade --install` invocation | ||
| # (e.g. "--set global.local=true --set dualPodsController.sleeperLimit=4") | ||
|
|
||
| set -euo pipefail | ||
| if [ "${FMA_DEBUG:-false}" = "true" ]; then | ||
| set -x | ||
| fi | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Helpers | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| step_num=0 | ||
| total_steps=6 | ||
|
|
||
| step() { | ||
| step_num=$((step_num + 1)) | ||
| echo "" | ||
| echo "========================================" | ||
| echo "[deploy_fma] Step ${step_num}/${total_steps}: $*" | ||
| echo "========================================" | ||
| echo "" | ||
| } | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Step 1: Validate required environment variables | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| step "Validate required environment variables" | ||
|
|
||
| missing=() | ||
| for var in FMA_NAMESPACE FMA_CHART_INSTANCE_NAME CONTAINER_IMG_REG IMAGE_TAG; do | ||
| if [ -z "${!var:-}" ]; then | ||
| missing+=("$var") | ||
| fi | ||
| done | ||
|
|
||
| if [ ${#missing[@]} -gt 0 ]; then | ||
| echo "ERROR: Missing required environment variables: ${missing[*]}" >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "Configuration:" | ||
| echo " FMA_NAMESPACE: $FMA_NAMESPACE" | ||
| echo " FMA_CHART_INSTANCE_NAME: $FMA_CHART_INSTANCE_NAME" | ||
| echo " CONTAINER_IMG_REG: $CONTAINER_IMG_REG" | ||
| echo " IMAGE_TAG: $IMAGE_TAG" | ||
| echo " NODE_VIEW_CLUSTER_ROLE: ${NODE_VIEW_CLUSTER_ROLE:-<will create>}" | ||
| echo " RUNTIME_CLASS_NAME: ${RUNTIME_CLASS_NAME:-<unset>}" | ||
| echo " POLICIES_ENABLED: ${POLICIES_ENABLED:-<auto-detect>}" | ||
| echo " HELM_EXTRA_ARGS: ${HELM_EXTRA_ARGS:-<none>}" | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Step 2: Apply FMA CRDs | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| step "Apply FMA CRDs" | ||
|
|
||
| CRD_NAMES="" | ||
| for crd_file in config/crd/*.yaml; do | ||
| crd_name=$(kubectl apply --dry-run=client -f "$crd_file" -o jsonpath='{.metadata.name}') | ||
| CRD_NAMES="$CRD_NAMES $crd_name" | ||
| if kubectl get crd "$crd_name" &>/dev/null; then | ||
| echo " CRD $crd_name already exists, skipping" | ||
| else | ||
| echo " Applying $crd_file ($crd_name)" | ||
| kubectl apply --server-side -f "$crd_file" | ||
| fi | ||
| done | ||
|
|
||
| echo "Waiting for CRDs to become Established..." | ||
| for crd_name in $CRD_NAMES; do | ||
| kubectl wait --for=condition=Established "crd/$crd_name" --timeout=120s | ||
| done | ||
| echo "All CRDs established" | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Step 3: Create node-viewer ClusterRole | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| step "Configure node-viewer ClusterRole" | ||
|
|
||
| if [ "${NODE_VIEW_CLUSTER_ROLE:-}" = "none" ]; then | ||
| CLUSTER_ROLE_NAME="" | ||
| echo "Skipped (NODE_VIEW_CLUSTER_ROLE=none)" | ||
| elif [ -n "${NODE_VIEW_CLUSTER_ROLE:-}" ]; then | ||
| CLUSTER_ROLE_NAME="${NODE_VIEW_CLUSTER_ROLE}" | ||
| echo "Using existing ClusterRole: $CLUSTER_ROLE_NAME" | ||
| else | ||
| CLUSTER_ROLE_NAME="${FMA_CHART_INSTANCE_NAME}-node-view" | ||
| if kubectl get clusterrole "$CLUSTER_ROLE_NAME" &>/dev/null; then | ||
| echo "ClusterRole $CLUSTER_ROLE_NAME already exists, skipping" | ||
| else | ||
| kubectl create clusterrole "$CLUSTER_ROLE_NAME" --verb=get,list,watch --resource=nodes | ||
| echo "ClusterRole $CLUSTER_ROLE_NAME created" | ||
| fi | ||
| fi | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Step 4: Detect and apply ValidatingAdmissionPolicies | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| step "ValidatingAdmissionPolicies" | ||
|
|
||
| if [ -z "${POLICIES_ENABLED:-}" ]; then | ||
| POLICIES_ENABLED=false | ||
| if kubectl api-resources --api-group=admissionregistration.k8s.io -o name 2>/dev/null \ | ||
| | grep -q 'validatingadmissionpolicies'; then | ||
| POLICIES_ENABLED=true | ||
| fi | ||
| echo "Auto-detected POLICIES_ENABLED=$POLICIES_ENABLED" | ||
| fi | ||
|
|
||
| if [ "$POLICIES_ENABLED" = "true" ]; then | ||
| echo "Applying ValidatingAdmissionPolicy resources..." | ||
| kubectl apply -f config/validating-admission-policies/ | ||
| else | ||
| echo "ValidatingAdmissionPolicy not supported or disabled, skipping" | ||
| fi | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Step 5: Deploy FMA controllers via Helm | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| step "Deploy FMA controllers via Helm" | ||
|
|
||
| HELM_ARGS=( | ||
| --set global.imageRegistry="${CONTAINER_IMG_REG}" | ||
| --set global.imageTag="${IMAGE_TAG}" | ||
| ) | ||
|
|
||
| # Append any caller-supplied Helm arguments (e.g. --set global.local=true) | ||
| if [ -n "${HELM_EXTRA_ARGS:-}" ]; then | ||
| read -ra _extra <<< "$HELM_EXTRA_ARGS" | ||
| HELM_ARGS+=("${_extra[@]}") | ||
| fi | ||
|
|
||
| if [ -n "$CLUSTER_ROLE_NAME" ]; then | ||
| HELM_ARGS+=(--set global.nodeViewClusterRole="${CLUSTER_ROLE_NAME}") | ||
| fi | ||
|
|
||
| helm upgrade --install "$FMA_CHART_INSTANCE_NAME" charts/fma-controllers \ | ||
| -n "$FMA_NAMESPACE" \ | ||
| "${HELM_ARGS[@]}" | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Step 6: Wait for controllers to be ready | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| step "Wait for controllers to be ready" | ||
|
|
||
| kubectl wait --for=condition=available --timeout=120s \ | ||
| deployment "${FMA_CHART_INSTANCE_NAME}-dual-pods-controller" -n "$FMA_NAMESPACE" | ||
| kubectl wait --for=condition=available --timeout=120s \ | ||
| deployment "${FMA_CHART_INSTANCE_NAME}-launcher-populator" -n "$FMA_NAMESPACE" | ||
| echo "Both controllers are available" | ||
|
|
||
| echo "" | ||
| echo "[deploy_fma] All steps completed successfully" | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.