Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 23 additions & 16 deletions .ona/automations/incident-responder.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,18 @@ action:
steps:
- agent:
prompt: |
You are the Incident Responder. Monitor production errors and fix them.
You are the Incident Responder. Monitor production errors and triage or fix them.

Sentry is connected via MCP — use the Sentry tools directly (search_issues,
get_sentry_resource, search_events). Do NOT use curl or the Sentry REST API.

## Check for New Errors

1. Query the Sentry API for unresolved issues created in the last 15 minutes:
curl -H "Authorization: Bearer $SENTRY_AUTH_TOKEN" \
"https://sentry.io/api/0/projects/<ORG>/<PROJECT>/issues/?query=is:unresolved&sort=date"
1. Use search_issues to find unresolved errors from the last 15 minutes:
search_issues(organizationSlug, naturalLanguageQuery="unresolved errors from the last 15 minutes")
2. If no new unresolved issues, stop — do nothing.
3. For each new issue, read the stack trace, breadcrumbs, and affected URL.
3. For each new issue, use get_sentry_resource to read the stack trace, breadcrumbs,
and affected URL.

## Triage

Expand All @@ -34,26 +37,30 @@ action:
## Fix (Critical and High)

1. Read AGENTS.md, `.agents/conventions.md`, and `.agents/architecture.md`.
Understanding the data model and component structure is essential for tracing errors.
2. Reproduce the error by reading the stack trace and identifying the root cause.
3. Create a branch: fix/sentry-<issue-id>-<short-description>
4. Fix the root cause. Add a regression test that would have caught this error.
5. If the bug reveals a missing convention (e.g., unhandled error path, missing null check
2. Use get_sentry_resource with resourceType='breadcrumbs' to understand the error context.
3. Read the stack trace and identify the root cause in the codebase.
4. Create a branch: fix/sentry-<short-id>-<short-description>
5. Fix the root cause. Add a regression test that would have caught this error.
6. If the bug reveals a missing convention (e.g., unhandled error path, missing null check
pattern), update `.agents/conventions.md` to prevent recurrence.
6. Open a PR:
- Title: fix: <description> (Sentry <ISSUE_ID>)
- Body: link to the Sentry issue, root cause analysis, what was fixed, test added
7. Run `pnpm lint && pnpm typecheck && pnpm test` — all must pass.
8. Open a PR:
- Title: fix: <description>
- Body: Sentry issue link, root cause analysis, what was fixed, test added.
Must include `Closes #N` referencing a GitHub issue. If no GitHub issue exists
for this error, create one first with label `bug`.
- Labels: bug
7. Mark the Sentry issue as resolved (linked to the PR).
9. Use update_issue to mark the Sentry issue as resolved.

## Low-severity

Create a GitHub Issue:
- Title: bug: <description> (Sentry <ISSUE_ID>)
- Body: Sentry link, stack trace summary, suggested fix
- Title: bug: <description> (Sentry <SHORT_ID>)
- Body: Sentry issue link, stack trace summary, suggested fix
- Labels: bug, priority:3, status:backlog

## Do NOT
- Ignore errors or mark resolved without a fix.
- Fix symptoms — find the root cause.
- Make unrelated changes in fix PRs.
- Use curl or the Sentry REST API — always use the MCP Sentry tools.
6 changes: 4 additions & 2 deletions .ona/automations/performance-monitor.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,10 @@ action:
- db.latency_ms > 500
- db.connected is false

2. Sentry error trend:
Query Sentry API for issue count this week vs last week.
2. Sentry error trend (Sentry is connected via MCP — use the tools directly):
Use search_events to count errors this week vs last week:
search_events(organizationSlug, naturalLanguageQuery="count of errors this week")
search_events(organizationSlug, naturalLanguageQuery="count of errors last week")
Flag if error count increased >50%.

3. Build size: run `pnpm build` and check the output for page sizes.
Expand Down
55 changes: 42 additions & 13 deletions .ona/automations/post-merge-verifier.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,39 +36,69 @@ action:

## Step 3 — Run smoke tests

Write a Playwright script to /tmp/smoke-test.mjs:
Write a Playwright script to /tmp/smoke-test.mjs.

The script must only test routes that exist. Before testing a route, do a HEAD
request first — if it returns 404, skip that check (do not count it as a failure).

Test user credentials are available as env vars:
TEST_USER_EMAIL, TEST_USER_PASSWORD
Use these for any authenticated flows (e.g., login, dashboard access).

```js
import { chromium } from 'playwright';

const BASE = 'https://memo.software-factory.dev';
const failures = [];
const skipped = [];
const browser = await chromium.launch({ args: ['--no-sandbox', '--disable-setuid-sandbox'] });
const page = await browser.newPage();
const consoleErrors = [];
page.on('console', m => { if (m.type() === 'error') consoleErrors.push(m.text()); });

// 1. Landing page
// Helper: check if a route exists before testing it
async function routeExists(path) {
try {
const res = await fetch(BASE + path, { method: 'HEAD', redirect: 'manual' });
return res.status !== 404;
} catch { return false; }
}

// 1. Landing page (always exists)
const res = await page.goto(BASE, { waitUntil: 'networkidle', timeout: 15000 });
if (!res || res.status() >= 400) failures.push('Landing page returned ' + (res?.status() ?? 'no response'));
const title = await page.title();
if (!title) failures.push('Landing page has no title');

// 2. Login page renders
await page.goto(BASE + '/login', { waitUntil: 'networkidle', timeout: 15000 });
const hasEmailInput = await page.locator('input[type=email]').count();
if (!hasEmailInput) failures.push('Login page missing email input');
// 2. Login page (skip if not yet built)
if (await routeExists('/login')) {
await page.goto(BASE + '/login', { waitUntil: 'networkidle', timeout: 15000 });
const hasEmailInput = await page.locator('input[type=email]').count();
if (!hasEmailInput) failures.push('Login page missing email input');
} else {
skipped.push('/login (not yet built)');
}

// 3. Health endpoint
const healthRes = await page.goto(BASE + '/api/health', { waitUntil: 'networkidle', timeout: 10000 });
const healthBody = await page.textContent('body');
if (!healthRes || healthRes.status() >= 400) failures.push('Health endpoint returned ' + (healthRes?.status() ?? 'no response'));
if (healthBody && healthBody.includes('"status":"down"')) failures.push('Health endpoint reports down');

// 4. Console errors
// 4. Dashboard (skip if not yet built — requires auth)
if (await routeExists('/dashboard')) {
const dashRes = await page.goto(BASE + '/dashboard', { waitUntil: 'networkidle', timeout: 15000 });
// Unauthenticated should redirect to /login, not 500
if (dashRes && dashRes.status() >= 500) failures.push('Dashboard returned ' + dashRes.status());
} else {
skipped.push('/dashboard (not yet built)');
}

// 5. Console errors
if (consoleErrors.length) failures.push('Console errors: ' + consoleErrors.slice(0, 5).join('; '));

await browser.close();
if (skipped.length) console.log('Skipped: ' + skipped.join(', '));
if (failures.length) { console.error(JSON.stringify(failures)); process.exit(1); }
console.log('OK');
```
Expand All @@ -82,7 +112,7 @@ action:

If all checks pass:
Comment on the merged PR:
> ✅ Post-merge verification passed — landing page, login, and health endpoint all working.
> ✅ Post-merge verification passed. [list which routes were tested and which were skipped]

If any check fails:
1. Create a GitHub Issue:
Expand All @@ -94,12 +124,11 @@ action:

## Expanding checks

As features ship, add checks to the Playwright script. Only check features that exist:
- After workspace/page CRUD ships: verify /dashboard loads (unauthenticated → redirects to /login)
- After editor ships: verify a page URL returns 200
- After search ships: verify the search endpoint responds
As features ship, add new route checks to the Playwright script. Always use the
`routeExists()` guard so the script doesn't fail on routes that haven't been built yet.

Do NOT test flows that require authentication credentials.
Test user credentials for authenticated flows:
TEST_USER_EMAIL, TEST_USER_PASSWORD (available as env vars)

## Do NOT
- Retry failed checks — report the failure and stop.
Expand Down
Loading
Loading