|
| 1 | +--- |
| 2 | +name: deploy-agents |
| 3 | +description: Deploy agents to OpenShift with auto-detected cluster config and refresh MLflow tracking tokens. |
| 4 | +argument-hint: "<agent_paths or 'all'> [--token-only]" |
| 5 | +--- |
| 6 | + |
| 7 | +# Deploy Agents to OpenShift |
| 8 | + |
| 9 | +> **Usage:** |
| 10 | +> - `/deploy-agents crewai/websearch_agent` — deploy one agent |
| 11 | +> - `/deploy-agents crewai/websearch_agent langgraph/react_agent` — deploy multiple |
| 12 | +> - `/deploy-agents all` — deploy all standard agents |
| 13 | +> - `/deploy-agents --token-only` — only refresh MLflow tokens, no deployment |
| 14 | +
|
| 15 | +You are deploying agents to the agentic-mcp OpenShift cluster. This skill automates cluster config detection, .env generation, container build/push, Helm deployment, and MLflow token refresh. |
| 16 | + |
| 17 | +## Input |
| 18 | + |
| 19 | +Arguments: $ARGUMENTS |
| 20 | + |
| 21 | +Parse the arguments to determine: |
| 22 | +- **Target agents**: space-separated paths relative to `agents/` (e.g., `crewai/websearch_agent`), or `all` |
| 23 | +- **Token-only mode**: if `--token-only` is present, skip Steps 1–3 and go directly to Step 4 |
| 24 | + |
| 25 | +If no arguments are provided, ask the user what to deploy. |
| 26 | + |
| 27 | +## Step 0: Validate Prerequisites |
| 28 | + |
| 29 | +Run these checks in parallel. Fail immediately if any required tool is missing. |
| 30 | + |
| 31 | +```bash |
| 32 | +oc whoami # must be authenticated |
| 33 | +oc project -q # capture current namespace — ALL operations scoped here |
| 34 | +helm version --short # must be installed |
| 35 | +``` |
| 36 | + |
| 37 | +If deploying (not `--token-only`), also check for a container CLI: |
| 38 | +```bash |
| 39 | +podman version 2>/dev/null || docker version 2>/dev/null |
| 40 | +``` |
| 41 | + |
| 42 | +Store the namespace from `oc project -q` — use explicit `-n <namespace>` on every `oc` command for the rest of this workflow. Never rely on the default context. |
| 43 | + |
| 44 | +## Step 1: Resolve Target Agents |
| 45 | + |
| 46 | +If argument is `all`: |
| 47 | +1. List all directories under `agents/` that contain both `agent.yaml` and a `Makefile` |
| 48 | +2. Filter to only standard agents: those whose `values.yaml` references `charts/agent/` (check for `chart:` field or Makefile `CHART_PATH`) |
| 49 | +3. **Skip with warning**: `langflow/simple_tool_calling_agent` (docker-compose based), `a2a/langgraph_crewai_agent` (custom chart) |
| 50 | + |
| 51 | +If specific paths given: |
| 52 | +1. For each path, verify `agents/<path>/agent.yaml` exists |
| 53 | +2. Warn and skip any non-standard agents |
| 54 | + |
| 55 | +Report the final list of agents to deploy before proceeding. |
| 56 | + |
| 57 | +## Step 2: Auto-Detect Cluster Config |
| 58 | + |
| 59 | +Detect config from existing deployments in the namespace to avoid asking the user for values they've already configured. |
| 60 | + |
| 61 | +```bash |
| 62 | +oc get deployments -n <namespace> -o json |
| 63 | +``` |
| 64 | + |
| 65 | +From the **first standard agent deployment found**, extract: |
| 66 | + |
| 67 | +| Value | Source | |
| 68 | +|---|---| |
| 69 | +| `BASE_URL` | env var from deployment spec | |
| 70 | +| `MODEL_ID` | env var from deployment spec | |
| 71 | +| `API_KEY` | from the deployment's referenced secret (base64-decode) | |
| 72 | +| `MLFLOW_TRACKING_URI` | env var from deployment spec | |
| 73 | +| `MLFLOW_EXPERIMENT_NAME` | env var from deployment spec | |
| 74 | +| `MLFLOW_TRACKING_INSECURE_TLS` | env var from deployment spec | |
| 75 | +| `MLFLOW_WORKSPACE` | env var from deployment spec | |
| 76 | +| Container image registry prefix | from deployment image spec (e.g., `quay.io/adonheis/`) | |
| 77 | + |
| 78 | +If **no existing deployments** are found in the namespace, ask the user for all required values. |
| 79 | + |
| 80 | +## Step 3: Deploy Each Target Agent |
| 81 | + |
| 82 | +Loop over each resolved agent. For each: |
| 83 | + |
| 84 | +### 3a: Check existing deployment |
| 85 | +```bash |
| 86 | +oc get deployment <agent-name> -n <namespace> 2>/dev/null |
| 87 | +``` |
| 88 | +If it already exists, ask the user whether to redeploy or skip. |
| 89 | + |
| 90 | +### 3b: Read agent requirements |
| 91 | +Read `agent.yaml` in the agent directory to discover required env vars. For agents with extra requirements beyond the standard set (e.g., `POSTGRES_*` for db-memory agents, `MCP_SERVER_URL` for autogen agents): |
| 92 | +- Try to auto-detect from an existing deployment of the same agent |
| 93 | +- If not found, ask the user |
| 94 | + |
| 95 | +### 3c: Check container image |
| 96 | +Check if the container image already exists in the registry: |
| 97 | +```bash |
| 98 | +podman manifest inspect <registry>/<image>:<tag> 2>/dev/null || skopeo inspect docker://<registry>/<image>:<tag> 2>/dev/null |
| 99 | +``` |
| 100 | +- If image exists: ask whether to rebuild or reuse |
| 101 | +- If image doesn't exist or check fails: will build |
| 102 | +- Construct the image name from the registry prefix (Step 2) and the agent name from `agent.yaml` |
| 103 | + |
| 104 | +### 3d: Write .env file |
| 105 | +Write the `.env` file in the agent directory with: |
| 106 | +- All auto-detected config from Step 2 |
| 107 | +- Fresh `MLFLOW_TRACKING_TOKEN` from `oc whoami -t` |
| 108 | +- `MLFLOW_WORKSPACE` set to the current namespace (`oc project -q`) — **mandatory for OpenShift MLflow**, without it the MLflow API returns "Workspace context is required" |
| 109 | +- `MLFLOW_TRACKING_INSECURE_TLS=true` (required when the cluster does not use trusted certificates) |
| 110 | +- `CONTAINER_IMAGE` using registry prefix + agent name |
| 111 | +- Any agent-specific extra vars from Step 3b |
| 112 | + |
| 113 | +**Never commit .env files** — they are already in `.gitignore`. |
| 114 | + |
| 115 | +### 3e: Build and push (if needed) |
| 116 | +If building: |
| 117 | +```bash |
| 118 | +cd agents/<path> |
| 119 | +make build |
| 120 | +make push |
| 121 | +``` |
| 122 | + |
| 123 | +### 3f: Deploy via Helm |
| 124 | +```bash |
| 125 | +cd agents/<path> |
| 126 | +make deploy |
| 127 | +``` |
| 128 | + |
| 129 | +### 3g: Verify health |
| 130 | +Wait a few seconds for the pod to start, then: |
| 131 | +```bash |
| 132 | +# Get the route |
| 133 | +oc get route <agent-name> -n <namespace> -o jsonpath='{.spec.host}' |
| 134 | +# Health check |
| 135 | +curl -sk https://<route>/health |
| 136 | +``` |
| 137 | + |
| 138 | +If health check fails, check pod status and logs: |
| 139 | +```bash |
| 140 | +oc get pods -n <namespace> -l app.kubernetes.io/name=<agent-name> --sort-by=.metadata.creationTimestamp |
| 141 | +oc logs deployment/<agent-name> -n <namespace> --tail=30 |
| 142 | +``` |
| 143 | + |
| 144 | +Report the result (healthy/unhealthy) and move to the next agent. |
| 145 | + |
| 146 | +## Step 4: Refresh MLflow Tokens for ALL Deployed Agents |
| 147 | + |
| 148 | +This step **always runs** — even with `--token-only`, even if no agents were just deployed. It refreshes tokens for every agent in the namespace, not just the ones targeted in this run. |
| 149 | + |
| 150 | +### 4a: Get fresh token |
| 151 | +```bash |
| 152 | +TOKEN=$(oc whoami -t) |
| 153 | +TOKEN_B64=$(echo -n "$TOKEN" | base64) |
| 154 | +``` |
| 155 | + |
| 156 | +### 4b: Find all MLflow token secrets |
| 157 | +```bash |
| 158 | +oc get secrets -n <namespace> -o json | jq -r '.items[] | select(.data["mlflow-tracking-token"] != null) | .metadata.name' |
| 159 | +``` |
| 160 | + |
| 161 | +### 4c: Patch each secret |
| 162 | +For each secret found: |
| 163 | +```bash |
| 164 | +oc patch secret <secret-name> -n <namespace> -p "{\"data\":{\"mlflow-tracking-token\":\"$TOKEN_B64\"}}" |
| 165 | +``` |
| 166 | + |
| 167 | +### 4d: Restart deployments |
| 168 | +For each agent whose token was refreshed: |
| 169 | +```bash |
| 170 | +oc rollout restart deployment/<agent-name> -n <namespace> |
| 171 | +``` |
| 172 | + |
| 173 | +### 4e: Verify MLflow connectivity |
| 174 | +Pick one agent and verify: |
| 175 | +```bash |
| 176 | +ROUTE=$(oc get route <agent-name> -n <namespace> -o jsonpath='{.spec.host}') |
| 177 | +# Wait for rollout |
| 178 | +oc rollout status deployment/<agent-name> -n <namespace> --timeout=120s |
| 179 | +# Health check |
| 180 | +curl -sk https://$ROUTE/health |
| 181 | +``` |
| 182 | + |
| 183 | +## Step 5: Summary Report |
| 184 | + |
| 185 | +Print a summary table: |
| 186 | + |
| 187 | +``` |
| 188 | +Agent | Status | Route | Health | Token |
| 189 | +-------------------------------|-------------|------------------------------------------|--------|-------- |
| 190 | +crewai/websearch_agent | deployed | websearch-agent-agentic-mcp.apps.xxx | OK | refreshed |
| 191 | +langgraph/react_agent | redeployed | react-agent-agentic-mcp.apps.xxx | OK | refreshed |
| 192 | +langgraph/hitl_agent | skipped | hitl-agent-agentic-mcp.apps.xxx | OK | refreshed |
| 193 | +autogen/chat_agent | failed | — | — | — |
| 194 | +``` |
| 195 | + |
| 196 | +If any agents failed, show the failure reason and suggest next steps. |
| 197 | + |
| 198 | +## Key Constraints |
| 199 | + |
| 200 | +- **Namespace isolation**: All `oc` commands use explicit `-n <namespace>`. Never touch resources outside the current namespace. |
| 201 | +- **No chart modifications**: Never modify `charts/agent/` templates. |
| 202 | +- **No .env commits**: `.env` files are written but never staged or committed. |
| 203 | +- **Token refresh is comprehensive**: Step 4 covers ALL agents in the namespace, not just targets. |
| 204 | +- **Ask before destructive actions**: Always confirm before redeploying an existing agent or rebuilding an image. |
0 commit comments