Skip to content

Commit 4d155f9

Browse files
authored
Merge pull request #276 from Hyperkid123/feat/RHCLOUD-48703
docs: preset migration guide + onboarding update
2 parents 1812460 + ba4d098 commit 4d155f9

2 files changed

Lines changed: 356 additions & 19 deletions

File tree

Lines changed: 302 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
# Preset System Migration Guide
2+
3+
How to migrate existing bot instances to the preset system. This guide is for teams running instances that were onboarded before the preset system existed.
4+
5+
**RHCLOUD-48670** — Workflow Presets: Multi-config system for bot instances
6+
7+
---
8+
9+
## TL;DR
10+
11+
Your instance works fine today without any changes — backward compatibility is preserved during the transition. However, migrating to `instance.yaml` is **required** to unblock the next phase of development (RHCLOUD-48705: legacy cleanup + new workflow presets). We'll help every team through the process.
12+
13+
---
14+
15+
## What Changed
16+
17+
The bot image now uses a **preset system** instead of a monolithic configuration. The image ships with:
18+
19+
```
20+
presets/
21+
├── core/ # Security rules, memory system, output mode (always loaded)
22+
│ └── CLAUDE.md
23+
├── workflows/
24+
│ └── jira-sprint/ # The main Jira triage → implement → PR loop
25+
│ ├── CLAUDE.md
26+
│ └── manifest.yaml
27+
└── envs/ # Additive capabilities
28+
├── browser/ # Chromium + chrome-devtools MCP
29+
├── container-scan/ # Grype + Buildah
30+
├── slack/ # Slack notifications
31+
└── dev-proxy/ # Caddy reverse proxy for stage UI verification
32+
```
33+
34+
At startup, `run.py`:
35+
1. Loads `instance.yaml` from your config repo (or falls back to env vars / defaults)
36+
2. Assembles `CLAUDE.md` from `core` + selected workflow preset
37+
3. Validates required MCP servers and env vars from workflow and env preset manifests
38+
4. Runs env preset install scripts (build-time) and entrypoint scripts (runtime)
39+
40+
### What stays the same
41+
42+
- All bot behavior is identical
43+
- Your config repo structure (`agent/`, `personas/`, `project-repos.json`, `mcp.json`, `settings.json`) is unchanged
44+
- Deployment parameters in app-interface are unchanged
45+
- The merge engine (protected keys, skills, hooks) works exactly as before
46+
47+
### What's new
48+
49+
- `instance.yaml` — a file in your config repo that declares which presets your instance uses
50+
- Startup validation — the bot validates MCP servers and env vars at startup and fails fast with clear errors instead of crashing mid-cycle
51+
- Env preset manifests — each capability declares what it provides and requires
52+
53+
---
54+
55+
## Do I Need to Migrate?
56+
57+
**Yes.** The migration is required to unblock the next development phase:
58+
59+
- **RHCLOUD-48705** removes hardcoded Dockerfile setup (Chromium, Grype, Caddy installs) and replaces it with the env preset install loop. Once that lands, instances must have `instance.yaml` so the system knows which presets to install.
60+
- Future workflow presets (reviewer, investigator, GitHub-based) require the preset selection mechanism to exist.
61+
62+
**Right now** your instance works fine without changes — defaults match current behavior. But you need to add `instance.yaml` before RHCLOUD-48705 merges. We'll help every team through it — see [Need Help?](#need-help) below.
63+
64+
### What does `instance.yaml` give you?
65+
66+
Beyond unblocking the next phase, it also lets you:
67+
68+
- **Drop unused capabilities** — e.g. your instance doesn't do visual QA, so drop `browser` to skip Chromium startup
69+
- **Use a different workflow** — when alternative workflows become available
70+
- **Append custom CLAUDE.md instructions** — layer instance-specific rules on top of the workflow
71+
- **Make preset choices explicit** — documented in your config repo instead of relying on implicit defaults
72+
73+
---
74+
75+
## How to Add `instance.yaml`
76+
77+
Create the file at `<BOT_CONFIG_PATH>/agent/instance.yaml` in your config repo.
78+
79+
### Minimal (equivalent to current defaults)
80+
81+
```yaml
82+
workflow: jira-sprint
83+
source: jira
84+
```
85+
86+
This is functionally identical to having no `instance.yaml` at all. All env presets are active by default.
87+
88+
### Explicit env preset selection
89+
90+
```yaml
91+
workflow: jira-sprint
92+
source: jira
93+
envs:
94+
- browser
95+
- slack
96+
- container-scan
97+
```
98+
99+
Only the listed env presets are active. `dev-proxy` is not listed, so it won't start.
100+
101+
### Minimal instance (no optional capabilities)
102+
103+
```yaml
104+
workflow: jira-sprint
105+
source: jira
106+
envs: []
107+
```
108+
109+
No env presets — no Chromium, no Grype, no Slack, no dev-proxy. The bot still triages, implements, and opens PRs, but can't take screenshots or scan containers.
110+
111+
### Custom CLAUDE.md (append)
112+
113+
```yaml
114+
workflow: jira-sprint
115+
source: jira
116+
envs:
117+
- browser
118+
- slack
119+
120+
claude_md:
121+
strategy: append
122+
```
123+
124+
If your config repo has an `agent/CLAUDE.md`, it gets appended after the workflow CLAUDE.md. Use this to add instance-specific rules (e.g. "never modify files in `legacy/`", "always run `make lint` before committing").
125+
126+
### Custom CLAUDE.md (replace)
127+
128+
```yaml
129+
workflow: jira-sprint
130+
source: jira
131+
132+
claude_md:
133+
strategy: replace
134+
```
135+
136+
Your `agent/CLAUDE.md` replaces the workflow's CLAUDE.md entirely. The core CLAUDE.md (security rules, memory system) is always loaded regardless of strategy — it cannot be overridden.
137+
138+
---
139+
140+
## `instance.yaml` Reference
141+
142+
| Field | Type | Default | Description |
143+
|-------|------|---------|-------------|
144+
| `workflow` | string | `jira-sprint` | Which workflow preset to use. Must exist in `presets/workflows/`. |
145+
| `source` | string | `jira` | Free-form string passed to skills. Currently: `jira`. Future: `github`, `gitlab`. |
146+
| `envs` | list or null | `null` (all) | Which env presets to activate. `null`/omitted = all available. `[]` = none. |
147+
| `claude_md.strategy` | string | `ignore` | How to handle instance CLAUDE.md: `ignore` (default), `append`, `replace`. |
148+
149+
---
150+
151+
## Env Var Fallback
152+
153+
Instances without a config repo (or without `instance.yaml`) can configure presets via env vars in the deployment template:
154+
155+
| Env Var | Default | Description |
156+
|---------|---------|-------------|
157+
| `BOT_WORKFLOW_PRESET` | `jira-sprint` | Workflow preset name |
158+
| `BOT_ENV_PRESETS` | _(all available)_ | Comma-separated env preset names. Empty string = none. |
159+
160+
These are checked only when no `instance.yaml` is found. If `instance.yaml` exists, it takes precedence.
161+
162+
---
163+
164+
## Available Presets
165+
166+
### Workflow: `jira-sprint`
167+
168+
The current (and only) workflow. Jira sprint triage → pick tickets → implement → open PRs → maintain PRs.
169+
170+
**Requires:**
171+
- MCP servers: `bot-memory`, `mcp-atlassian`
172+
- Env vars: `BOT_LABEL`, `BOT_INSTANCE_ID`, `BOT_JIRA_EMAIL`
173+
- Optional: `BOT_CONFIG_REPO`, `BOT_BOARD_ID`, `BOT_BOARD_NAME`, `SLACK_WEBHOOK_URL`, `BOT_INCLUDE_BACKLOG`
174+
175+
### Env: `browser`
176+
177+
Chromium + Playwright + chrome-devtools MCP server for visual verification and screenshot capture.
178+
179+
**Requires:** `PLAYWRIGHT_BROWSERS_PATH` (set automatically in the image)
180+
181+
**Provides:** `chrome-devtools` MCP server, `gh-release-upload` skill
182+
183+
### Env: `container-scan`
184+
185+
Grype vulnerability scanner + Buildah container builder for CVE scanning.
186+
187+
**Provides:** `grype`, `buildah` CLI tools
188+
189+
### Env: `slack`
190+
191+
Slack notifications via webhook.
192+
193+
**Requires:** `SLACK_WEBHOOK_URL`
194+
195+
**Provides:** `slack-notify` skill
196+
197+
### Env: `dev-proxy`
198+
199+
Custom Caddy reverse proxy for local UI verification against stage environments.
200+
201+
**Requires:** `PROXY_HOST`
202+
**Optional:** `PROXY_PORT`
203+
204+
**Provides:** `caddy` CLI tool, `start-dev-proxy.sh` sandbox allowance
205+
206+
---
207+
208+
## Startup Validation
209+
210+
The bot now validates preset requirements at startup. If a required MCP server or env var is missing, the bot exits with a clear error instead of crashing mid-cycle.
211+
212+
```
213+
[2026-06-26 10:00:00] FATAL: Required MCP server 'mcp-atlassian' not configured
214+
[2026-06-26 10:00:00] FATAL: Required env var 'BOT_JIRA_EMAIL' not set
215+
[2026-06-26 10:00:00] Workflow 'jira-sprint' manifest validation failed — 2 error(s). Check deployment config.
216+
```
217+
218+
Missing optional env vars produce warnings but don't block startup:
219+
220+
```
221+
[2026-06-26 10:00:00] Optional env var 'SLACK_WEBHOOK_URL' not set
222+
```
223+
224+
Env preset validation also warns when a preset's required env vars are missing:
225+
226+
```
227+
[2026-06-26 10:00:00] Env preset 'slack' requires 'SLACK_WEBHOOK_URL' (not set)
228+
```
229+
230+
---
231+
232+
## FAQ
233+
234+
### What breaks if I don't add `instance.yaml`?
235+
236+
Nothing — **today**. The defaults match current behavior. But once RHCLOUD-48705 lands (legacy Dockerfile cleanup), instances without `instance.yaml` won't know which env presets to install. Add it now while everything still works identically.
237+
238+
### When do I need to have it done by?
239+
240+
Before RHCLOUD-48705 merges. We'll communicate the exact date once all teams have been contacted. No one gets surprised — we'll reach out individually.
241+
242+
### What if I reference a preset that doesn't exist?
243+
244+
Missing workflow preset = FATAL error (bot won't start). Missing env preset = WARNING (logged, skipped, bot continues).
245+
246+
### Do I need to change my deployment template?
247+
248+
No. The new env vars (`BOT_WORKFLOW_PRESET`, `BOT_ENV_PRESETS`) are optional fallbacks. Your existing parameters are unchanged.
249+
250+
### What about the `setup.sh` in my runner repo?
251+
252+
Still works. The build chain runs: preset install scripts → instance `setup.sh`. Your instance-specific installs run last and can depend on anything presets installed.
253+
254+
### When will the hardcoded Dockerfile code be removed?
255+
256+
RHCLOUD-48705 is blocked by this migration. Once all active instances have added `instance.yaml`, we'll merge the cleanup. We'll reach out to each team individually before that happens — no surprises.
257+
258+
### Will new presets be added?
259+
260+
Yes. Future workflow presets (reviewer, investigator) and env presets are planned. They'll be documented here as they become available.
261+
262+
---
263+
264+
## Config Repo Directory Structure (updated)
265+
266+
```
267+
my-instance-config/
268+
└── agent/
269+
├── instance.yaml # NEW — preset selection (optional)
270+
├── project-repos.json # repo mappings (unchanged)
271+
├── mcp.json # MCP server overrides (unchanged)
272+
├── settings.json # settings overrides (unchanged)
273+
├── CLAUDE.md # instance CLAUDE.md (used with strategy: append/replace)
274+
├── personas/ # domain-specific guidelines (unchanged)
275+
│ ├── frontend/
276+
│ └── backend/
277+
├── skills/ # instance-specific skills (unchanged)
278+
└── hooks/ # additional hooks (unchanged)
279+
```
280+
281+
The only new file is `instance.yaml`. Everything else is unchanged.
282+
283+
---
284+
285+
## Timeline
286+
287+
1. **Now** — Preset system is live. All instances work unchanged. Backward compatible.
288+
2. **This sprint** — Teams review this guide and add `instance.yaml` to their config repos. We'll help with any questions.
289+
3. **After all instances migrate** — RHCLOUD-48705: Remove hardcoded Dockerfile code. Env preset install/entrypoint scripts become the sole mechanism for capabilities like Chromium, Grype, etc.
290+
4. **Future** — New workflow presets (reviewer, investigator, GitHub-based) become available.
291+
292+
---
293+
294+
## Need Help?
295+
296+
We'll help every team through the migration. Options:
297+
298+
- **We write it for you** — tell us which capabilities your instance uses and we'll open a PR with the `instance.yaml`
299+
- **Pair on it** — reach out in the team Slack channel and we'll walk through it together
300+
- **Self-service** — follow the examples above, most instances just need the [minimal config](#minimal-equivalent-to-current-defaults)
301+
302+
Questions or concerns? Comment on [RHCLOUD-48670](https://issues.redhat.com/browse/RHCLOUD-48670) or ping us in Slack.

0 commit comments

Comments
 (0)