Skip to content

Commit 2f36d70

Browse files
committed
Harden QA reporting, thresholds, and multi-target flows
Add PR quality comment publishing, threshold-driven regression gates, multi-target onboarding/intake handling, and CI coverage updates while finalizing remaining issue-era references to the QA-first runtime model.
1 parent 69923bf commit 2f36d70

17 files changed

Lines changed: 383 additions & 314 deletions

File tree

.github/workflows/server-mode-smoke.yml

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,8 @@ jobs:
3535

3636
- name: Start intake API
3737
run: |
38-
INTAKE_REPO_APP_MAP_JSON='{"speedscale/demo":"examples/apps/demo-node/agentapp.yaml"}' \
39-
INTAKE_ALLOWED_REPOS='speedscale/demo' \
38+
INTAKE_REPO_APP_MAP_JSON='{"speedscale/demo":"examples/apps/demo-node/agentapp.yaml","speedscale/demo-multi":"examples/apps/demo-node-multi-target/agentapp.yaml"}' \
39+
INTAKE_ALLOWED_REPOS='speedscale/demo,speedscale/demo-multi' \
4040
npm run intake-api > intake-api.log 2>&1 &
4141
echo $! > intake-api.pid
4242
@@ -94,6 +94,13 @@ jobs:
9494
--data-binary @webhook-pr.json > webhook-response.json
9595
node -e 'const fs=require("fs"); const payload=JSON.parse(fs.readFileSync("webhook-response.json","utf8")); if(!payload.runs?.[0]?.metadata?.name){console.error(payload); process.exit(1)};'
9696
97+
- name: Submit multi-target intake request
98+
run: |
99+
curl -sS -X POST http://127.0.0.1:8080/qa/runs \
100+
-H "content-type: application/json" \
101+
--data-binary @examples/runs/demo-node-multi-target-pr-quality-intake.json > multi-target-response.json
102+
node -e 'const fs=require("fs"); const payload=JSON.parse(fs.readFileSync("multi-target-response.json","utf8")); const runs=payload.runs||[]; if(runs.length!==2){console.error(payload); process.exit(1)}; const names=runs.map((r)=>r.metadata?.name||""); if(!names.every((name)=>name.includes("node-api")||name.includes("node-worker"))){console.error(names); process.exit(1)};'
103+
97104
- name: Stop intake API
98105
if: always()
99106
run: |
@@ -113,3 +120,4 @@ jobs:
113120
intake-response.json
114121
comparison-response.json
115122
webhook-response.json
123+
multi-target-response.json

AGENTS.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
## Purpose
44

5-
This repository is a public reference architecture for an autonomous issue-to-fix workflow centered on the inner loop:
5+
This repository is a public reference architecture for an autonomous quality-validation workflow centered on the inner loop:
66

7-
`issue -> plan -> build -> validate`
7+
`request -> baseline -> compare -> report`
88

99
Keep changes aligned with that goal. Prefer small, explicit contracts over broad platform abstractions.
1010

@@ -20,7 +20,7 @@ Keep changes aligned with that goal. Prefer small, explicit contracts over broad
2020

2121
- The first implementation should target one simple demo application.
2222
- The agent should operate against an app manifest rather than hardcoded repo logic.
23-
- The initial system should emit artifacts for every step: triage, plan, patch, build logs, validation result.
23+
- The initial system should emit artifacts for every step: request, baseline target, build logs, validation result, quality report.
2424

2525
## Repository Conventions
2626

docs/phase-b-first-run.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22

33
Audience: Agent Factory developers and operators tracking Phase B execution outcomes.
44

5-
This is the concrete execution plan for the first real-ticket autonomous run.
5+
This document is historical context from the issue-first phase. Current operator flows are PR/manual QA request-first.
66

77
## Selected target
88

99
- repo: `speedscale/microsvc`
1010
- issue: `#58`
1111
- issue URL: `https://github.com/speedscale/microsvc/issues/58`
12-
- intake payload: `examples/runs/microsvc-user-service-intake.json`
12+
- intake payload (legacy): `examples/runs/microsvc-user-service-intake.json`
1313

1414
## Why this issue
1515

@@ -20,7 +20,7 @@ This is the concrete execution plan for the first real-ticket autonomous run.
2020
## Execution steps
2121

2222
1. start intake and worker in server mode
23-
2. submit `microsvc-user-service-intake.json` to intake API
23+
2. submit a QA intake payload to `/qa/runs` (legacy flow used `microsvc-user-service-intake.json`)
2424
3. wait for run phase to reach `succeeded` or `failed`
2525
4. collect evidence bundle from `artifacts/<run-name>/`
2626
5. evaluate against `docs/autonomy-mvp.md` rubric

docs/server.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ GitHub PR webhook intake (optional):
6262
- auth options for GitHub API calls:
6363
- preferred: `GITHUB_APP_ID` + `GITHUB_APP_PRIVATE_KEY`
6464
- fallback: `GITHUB_BOT_TOKEN` or `GH_TOKEN`
65+
- worker uses same credentials to publish/update PR quality comments
6566

6667
GitHub poller mode (optional):
6768

@@ -73,8 +74,10 @@ GitHub poller mode (optional):
7374
- polls open issues/PRs in `INTAKE_ALLOWED_REPOS`
7475
- loads repo manifests from `INTAKE_REPO_APP_MAP_FILE` or `INTAKE_REPO_APP_MAP_JSON`
7576
- queues runs for events that satisfy required labels
77+
- queues one run per onboarded quality target when target is not explicitly specified
7678
- posts one bot comment for missing-label or missing-manifest cases
7779
- GitHub auth uses same precedence as webhook intake (App first, token fallback)
80+
- worker posts/updates one PR quality comment per target run
7881

7982
Worker trigger mode (optional):
8083

@@ -92,6 +95,7 @@ Run operations:
9295
- list runs: `npm run runs -- list [--phase <phase>]`
9396
- retry a failed run: `npm run runs -- retry <run-name>`
9497
- queue onboarding baseline run: `npm run runs -- baseline examples/apps/demo-node/agentapp.yaml --target demo-node`
98+
- queue all targets from multi-target manifest: `npm run runs -- baseline examples/apps/demo-node-multi-target/agentapp.yaml`
9599

96100
When Redis backend is enabled, intake and retry operations enqueue run names to Redis and workers consume from Redis.
97101

docs/users.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,14 @@ curl -sS -X POST http://127.0.0.1:8080/qa/runs \
5353
--data-binary @examples/runs/demo-node-pr-quality-intake.json
5454
```
5555

56+
Multi-target example (queues one run per target):
57+
58+
```bash
59+
curl -sS -X POST http://127.0.0.1:8080/qa/runs \
60+
-H "content-type: application/json" \
61+
--data-binary @examples/runs/demo-node-multi-target-pr-quality-intake.json
62+
```
63+
5664
Queue onboarding baseline from manifest:
5765

5866
```bash
@@ -76,6 +84,7 @@ For real PR requests, treat successful command execution as necessary but not su
7684
- provide endpoint-level replay outcomes when performance is in scope
7785
- keep `build.test` and `validate.proxymock.command` meaningful (no no-op placeholders)
7886
- ensure baseline artifacts are current for each onboarded quality target
87+
- ensure GitHub bot auth is configured so PR quality comments can be posted/updated
7988

8089
## Operational Baselines
8190

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
apiVersion: agents.speedscale.io/v1alpha1
2+
kind: AgentApp
3+
metadata:
4+
name: demo-node-multi-target
5+
spec:
6+
repo:
7+
provider: github
8+
url: https://github.com/speedscale/demo
9+
defaultBranch: main
10+
workdir: node
11+
issue:
12+
labels:
13+
include:
14+
- agent
15+
- bug
16+
quality:
17+
trigger:
18+
pullRequest: true
19+
manualRequest: true
20+
prePrRequest: true
21+
baseline:
22+
strategy: multi-project
23+
targets:
24+
- name: node-api
25+
workdir: node
26+
baselineRef: baseline/demo/node-api
27+
command: npm test
28+
- name: node-worker
29+
workdir: node
30+
baselineRef: baseline/demo/node-worker
31+
command: npm test
32+
reporting:
33+
formats:
34+
- json
35+
- markdown
36+
failOnRegression: true
37+
thresholds:
38+
maxBuildStderrLineDelta: 20
39+
maxValidationStderrLineDelta: 20
40+
build:
41+
install: npm ci
42+
test: npm test
43+
start: npm start
44+
validate:
45+
proxymock:
46+
dataset: demo-node-404
47+
mode: replay-with-mocks
48+
command: proxymock replay
49+
service:
50+
command: npm start
51+
host: 127.0.0.1
52+
port: 3000
53+
startupTimeoutSeconds: 30
54+
policy:
55+
autoBranch: true
56+
autoMr: true
57+
autoMerge: false

examples/apps/demo-node/agentapp.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ spec:
3030
- json
3131
- markdown
3232
failOnRegression: true
33+
thresholds:
34+
maxBuildStderrLineDelta: 20
35+
maxValidationStderrLineDelta: 20
3336
build:
3437
install: npm ci
3538
test: npm test

examples/apps/microsvc-user-service/agentapp.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ spec:
3030
- json
3131
- markdown
3232
failOnRegression: true
33+
thresholds:
34+
maxBuildStderrLineDelta: 50
35+
maxValidationStderrLineDelta: 50
3336
build:
3437
install: make build
3538
test: make test
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
{
2+
"source": "developer",
3+
"repository": {
4+
"provider": "github",
5+
"owner": "speedscale",
6+
"name": "demo-multi"
7+
},
8+
"appRef": {
9+
"name": "demo-node-multi-target"
10+
},
11+
"request": {
12+
"mode": "comparison",
13+
"pullRequest": {
14+
"number": 999,
15+
"url": "https://github.com/speedscale/demo/pull/999",
16+
"headSha": "abc123",
17+
"baseSha": "def456"
18+
}
19+
},
20+
"requestedBy": {
21+
"type": "user",
22+
"login": "developer"
23+
},
24+
"metadata": {
25+
"reason": "multi-target-intake-smoke"
26+
}
27+
}

schemas/agentapp.schema.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,14 @@ properties:
115115
- markdown
116116
failOnRegression:
117117
type: boolean
118+
thresholds:
119+
type: object
120+
properties:
121+
maxBuildStderrLineDelta:
122+
type: number
123+
maxValidationStderrLineDelta:
124+
type: number
125+
additionalProperties: false
118126
additionalProperties: false
119127
additionalProperties: false
120128
build:

0 commit comments

Comments
 (0)