|
| 1 | +# Operations Guide |
| 2 | + |
| 3 | +How to manage and monitor the bot in day-to-day use. |
| 4 | + |
| 5 | +## Monitoring |
| 6 | + |
| 7 | +### Jira Filter |
| 8 | + |
| 9 | +All bot-eligible tickets across teams are tracked via a shared Jira filter: |
| 10 | + |
| 11 | +**Filter ID**: [107017](https://redhat.atlassian.net/issues/?filter=107017) |
| 12 | + |
| 13 | +**JQL**: `project = RHCLOUD AND labels in (hcc-ai-framework, hcc-ai-platform-accessmanagement)` |
| 14 | + |
| 15 | +This shows all tickets tagged for the bot, regardless of status. Use it to see what the bot is working on, what's queued, and what's done. |
| 16 | + |
| 17 | +To add a new team's tickets, add their label to the filter (e.g. `hcc-ai-<team-name>`). |
| 18 | + |
| 19 | +### Dashboard |
| 20 | + |
| 21 | +The memory server dashboard at `http://localhost:8080` shows: |
| 22 | +- **Active tasks** — what the bot is currently working on, with status and PR links |
| 23 | +- **Memories** — learnings the bot has stored from completed work |
| 24 | +- **Costs** — per-cycle cost breakdown by work type |
| 25 | + |
| 26 | +### Logs |
| 27 | + |
| 28 | +```bash |
| 29 | +make logs # Tail bot.log |
| 30 | +docker compose logs -f bot # Docker container logs |
| 31 | +docker compose logs -f memory-server # Memory server logs |
| 32 | +``` |
| 33 | + |
| 34 | +## Ticket Lifecycle |
| 35 | + |
| 36 | +A ticket goes through these stages as the bot processes it: |
| 37 | + |
| 38 | +``` |
| 39 | + Backlog In Progress Code Review Done |
| 40 | + ┌──────────┐ ┌──────────────┐ ┌─────────────┐ ┌──────┐ |
| 41 | + │ Groomed │ bot claims │ Bot working │ PR open │ Waiting for │ merged │ Done │ |
| 42 | + │ + labeled│ ───────────> │ on branch │ ───────> │ human review│ ───────> │ │ |
| 43 | + │ │ │ bot/<KEY> │ │ │ │ │ |
| 44 | + └──────────┘ └──────────────┘ └─────────────┘ └──────┘ |
| 45 | + │ │ │ │ |
| 46 | + │ │ │ │ |
| 47 | + Human grooms Bot updates Bot responds Bot closes |
| 48 | + and labels task metadata to review Jira ticket |
| 49 | + the ticket in memory server feedback + stores |
| 50 | + learnings |
| 51 | +``` |
| 52 | + |
| 53 | +### Example: RHCLOUD-37254 |
| 54 | + |
| 55 | +A real ticket processed by the bot: |
| 56 | + |
| 57 | +1. **Groomed** — Human added labels `hcc-ai-platform-accessmanagement` and `repo:insights-rbac` |
| 58 | +2. **Picked up** — Bot found it via JQL query, assigned itself, transitioned to "In Progress", added to the active sprint |
| 59 | +3. **Implemented** — Bot cloned `insights-rbac`, created branch `bot/RHCLOUD-37254`, loaded the `rbac` persona, read repo `CLAUDE.md`, implemented the fix |
| 60 | +4. **PR opened** — Bot pushed the branch, opened a PR via `gh pr create`, transitioned ticket to "Code Review", commented on Jira with the PR link |
| 61 | +5. **Review cycle** — Human reviewed the PR. Bot checked for new feedback each cycle and addressed comments |
| 62 | +6. **Merged** — Once the PR was merged, bot transitioned the ticket to "Done" and stored learnings in RAG memory |
| 63 | + |
| 64 | +### Example: RHCLOUD-46011 |
| 65 | + |
| 66 | +A frontend ticket with visual verification: |
| 67 | + |
| 68 | +1. **Groomed** — Labels `hcc-ai-framework` and `repo:astro-virtual-assistant-frontend` |
| 69 | +2. **Implemented** — Bot loaded the `frontend` persona, used PatternFly MCP for component docs |
| 70 | +3. **Visual verification** — Bot started the dev server, used chrome-devtools MCP to navigate to the affected page, took before/after screenshots |
| 71 | +4. **PR opened** — Screenshots embedded as base64 in the PR description (never committed to the repo) |
| 72 | +5. **Result** — [astro-virtual-assistant-frontend#368](https://github.com/RedHatInsights/astro-virtual-assistant-frontend/pull/368) |
| 73 | + |
| 74 | +## What the Bot Has Done |
| 75 | + |
| 76 | +The bot has been running since late March 2026 on the `hcc-ai-framework` label. Here's a summary of completed work, showing how different ticket types are handled. |
| 77 | + |
| 78 | +### Cross-repo feature: RHCLOUD-46384 |
| 79 | + |
| 80 | +**Add icon field to FrontendEnvironment CRD** — a feature spanning 3 repos. |
| 81 | + |
| 82 | +The bot received a groomed ticket with labels `repo:frontend-operator` and `repo:insights-chrome`. It investigated the `insights-chrome` code first, discovered `ServiceIcon.tsx` already supported icon rendering — the gap was in the operator CRD. It then: |
| 83 | + |
| 84 | +- Added an `icon` field to the FrontendEnvironment CRD in Go, updated the reconciler, regenerated manifests, and updated e2e tests |
| 85 | +- Opened [frontend-operator#569](https://github.com/RedHatInsights/frontend-operator/pull/569) |
| 86 | +- Reported on Jira that insights-chrome needed no changes and that app-interface (readonly repo) needed manual config updates |
| 87 | + |
| 88 | +When asked via a Jira comment to also handle the app-interface changes, the bot initially reported it lacked push access. After being told about a fork repo, it cloned the fork, made the changes, and opened [app-interface MR !180888](https://gitlab.cee.redhat.com/service/app-interface/-/merge_requests/180888) — a cross-host (GitHub + GitLab) ticket resolved in a single cycle. |
| 89 | + |
| 90 | +### UI bug fix: RHCLOUD-44667 |
| 91 | + |
| 92 | +**Timespan option labels empty in Notifications Event Log** — a PatternFly migration bug. |
| 93 | + |
| 94 | +A user reported that the dropdown for selecting time ranges showed blank labels. The bot identified the root cause: `SelectOption` components were self-closing (`<SelectOption ... />`) with no children. PatternFly v6 requires label text as children. |
| 95 | + |
| 96 | +- Fixed both `EventLogDateFilter.tsx` and `NotificationsLogDateFilter.tsx` |
| 97 | +- Opened [notifications-frontend#884](https://github.com/RedHatInsights/notifications-frontend/pull/884) |
| 98 | +- PR merged, bot closed the Jira ticket and stored the PF6 pattern as a RAG memory for future reference |
| 99 | + |
| 100 | +### CVE triage: RHCLOUD-44642 |
| 101 | + |
| 102 | +**CVE-2026-24842 node-tar in pdf-generator** — security vulnerability ticket. |
| 103 | + |
| 104 | +The bot checked the current state of `node-tar` in pdf-generator and found it was already at version 7.5.11 (fix version was 7.5.7). Verified with `npm ls tar` and `npm audit`. No code changes needed — bot commented the analysis on Jira and closed the ticket. Total time: one cycle. |
| 105 | + |
| 106 | +### CI migration with human feedback loop: RHCLOUD-46420 |
| 107 | + |
| 108 | +**Migrate pdf-generator Jenkins from GHPRB to GitHub Branch Source** — a deadline-driven infrastructure task. |
| 109 | + |
| 110 | +The bot initially created a Jenkinsfile and opened [pdf-generator#313](https://github.com/RedHatInsights/pdf-generator/pull/313). A human commented on Jira: "The Jenkins jobs haven't been passing for a long time — we should just remove them entirely." The bot: |
| 111 | + |
| 112 | +1. Closed the original PR |
| 113 | +2. Opened a new PR [#314](https://github.com/RedHatInsights/pdf-generator/pull/314) removing the unused CI scripts |
| 114 | +3. Opened [app-interface MR !180890](https://gitlab.cee.redhat.com/service/app-interface/-/merge_requests/180890) removing the Jenkins job config |
| 115 | +4. All within the same conversation thread on Jira |
| 116 | + |
| 117 | +This demonstrates the feedback loop: the bot adapts to human direction mid-flight. |
| 118 | + |
| 119 | +### Summary table |
| 120 | + |
| 121 | +| Ticket | Summary | Repos | Type | Result | |
| 122 | +|--------|---------|-------|------|--------| |
| 123 | +| RHCLOUD-46384 | Add icon field to FrontendEnvironment CRD | frontend-operator, insights-chrome, app-interface | cross-repo feature | PR + MR merged | |
| 124 | +| RHCLOUD-38822 | Service Dropdown displaying incorrect icon set | frontend-operator, insights-chrome, app-interface | investigation + fix | closed | |
| 125 | +| RHCLOUD-46426 | ChangeDefaultTemplate does not reset previous default | widget-layout-backend | bug fix | PR merged | |
| 126 | +| RHCLOUD-44667 | Timespan option labels empty in Notifications Event Log | notifications-frontend | UI bug fix | PR merged | |
| 127 | +| RHCLOUD-46420 | Migrate pdf-generator Jenkins PR check | pdf-generator, app-interface | CI cleanup | PR + MR merged | |
| 128 | +| RHCLOUD-45699 | Update virtual-assistant grype scanning | astro-virtual-assistant-v2 | tech debt | PR merged | |
| 129 | +| RHCLOUD-44642 | CVE-2026-24842 pdf-generator: node-tar | pdf-generator | CVE triage | already fixed, closed | |
| 130 | +| RHCLOUD-44644 | CVE-2026-24842 payload-tracker-frontend: node-tar | payload-tracker-frontend | CVE fix | PR merged | |
| 131 | +| RHCLOUD-43838 | CVE-2025-12816 payload-tracker-frontend: node-forge | payload-tracker-frontend | CVE fix | PR merged | |
| 132 | + |
| 133 | +In progress: |
| 134 | + |
| 135 | +| Ticket | Summary | Repos | Status | |
| 136 | +|--------|---------|-------|--------| |
| 137 | +| RHCLOUD-37254 | RBAC allowing roles with same name as System Roles | insights-rbac | Code Review | |
| 138 | + |
| 139 | +## Adding Work for the Bot |
| 140 | + |
| 141 | +### Step 1: Groom the ticket |
| 142 | + |
| 143 | +Use the interactive grooming prompt: |
| 144 | + |
| 145 | +```bash |
| 146 | +claude --prompt-file prompts/groom.md |
| 147 | +``` |
| 148 | + |
| 149 | +Or manually ensure the ticket has: |
| 150 | +- Clear problem statement (current vs expected behavior) |
| 151 | +- Specific files/components if known |
| 152 | +- Acceptance criteria as a checklist |
| 153 | +- Scoped to a single PR |
| 154 | + |
| 155 | +### Step 2: Label the ticket |
| 156 | + |
| 157 | +Required labels: |
| 158 | +- **Primary label** — matches the bot instance: `hcc-ai-framework` or `hcc-ai-platform-accessmanagement` |
| 159 | +- **`repo:<name>`** — must match a key in `project-repos.json` (e.g. `repo:insights-rbac`, `repo:notifications-frontend`) |
| 160 | + |
| 161 | +Optional: |
| 162 | +- `needs-investigation` — bot investigates and reports findings instead of implementing |
| 163 | +- `platform-experience-ui` — routes to the UI sprint board |
| 164 | + |
| 165 | +### Step 3: Leave it unassigned |
| 166 | + |
| 167 | +The bot only picks up unassigned tickets. If a ticket is assigned to someone, the bot skips it. |
| 168 | + |
| 169 | +### JQL the bot uses |
| 170 | + |
| 171 | +New work: |
| 172 | +``` |
| 173 | +project = RHCLOUD AND labels = <PRIMARY_LABEL> |
| 174 | + AND assignee is EMPTY AND status != Done |
| 175 | + ORDER BY priority DESC, created ASC |
| 176 | +``` |
| 177 | + |
| 178 | +Assigned tickets check: |
| 179 | +``` |
| 180 | +project = RHCLOUD AND labels = <PRIMARY_LABEL> |
| 181 | + AND assignee = currentUser() AND status != Done |
| 182 | + ORDER BY updated DESC |
| 183 | +``` |
| 184 | + |
| 185 | +## Running Multiple Bots |
| 186 | + |
| 187 | +Each bot instance handles one team label. To run multiple bots: |
| 188 | + |
| 189 | +### On host (development) |
| 190 | + |
| 191 | +```bash |
| 192 | +# Terminal 1 |
| 193 | +make run LABEL=hcc-ai-framework |
| 194 | + |
| 195 | +# Terminal 2 |
| 196 | +make run-rbac |
| 197 | +``` |
| 198 | + |
| 199 | +### In Docker |
| 200 | + |
| 201 | +```bash |
| 202 | +# Start with default label |
| 203 | +make docker-up |
| 204 | + |
| 205 | +# Start with a different label |
| 206 | +BOT_LABEL=hcc-ai-platform-accessmanagement make docker-up |
| 207 | +``` |
| 208 | + |
| 209 | +For multiple labels in Docker simultaneously, you'd run separate compose projects or add multiple bot services to `docker-compose.yml`. |
| 210 | + |
| 211 | +## Cost Management |
| 212 | + |
| 213 | +Each cycle records its cost. Monitor spending: |
| 214 | + |
| 215 | +```bash |
| 216 | +make costs-today # Today's spend |
| 217 | +make costs-week # Last 7 days |
| 218 | +make costs # All time |
| 219 | +``` |
| 220 | + |
| 221 | +The bot sleeps for 5 minutes between active cycles and 1 hour when idle (no work found). These intervals are configured in `config.json`. |
| 222 | + |
| 223 | +## Troubleshooting |
| 224 | + |
| 225 | +### Bot finds no work |
| 226 | + |
| 227 | +Check: |
| 228 | +1. Are there tickets with the right primary label? Use the Jira filter above |
| 229 | +2. Are the tickets unassigned? |
| 230 | +3. Do they have a `repo:` label matching `project-repos.json`? |
| 231 | +4. Is the bot at the 5-task capacity limit? Check the dashboard |
| 232 | + |
| 233 | +### MCP server fails to connect |
| 234 | + |
| 235 | +Check the bot logs for which server failed: |
| 236 | +- `bot-memory` — is the memory server running? (`make memory-server`) |
| 237 | +- `mcp-atlassian` — are Jira credentials set in `.env`? |
| 238 | +- `chrome-devtools` — is Chromium running? (only in Docker; on host, run `./start-chromium.sh`) |
| 239 | + |
| 240 | +### Bot is stuck on a ticket |
| 241 | + |
| 242 | +Check the task record in the dashboard. Look at `metadata.last_step` and `metadata.notes`. If truly stuck: |
| 243 | +1. Comment on the Jira ticket explaining the blocker |
| 244 | +2. Manually set the task status to `paused` via the dashboard |
| 245 | +3. The bot will skip it and move to other work |
0 commit comments