Skip to content

Commit 6502220

Browse files
authored
Merge pull request #106 from PatrickSys/ui-proof-agent-browser-workflow
Bake agent-browser into UI proof workflow
2 parents f5e705b + 9d49792 commit 6502220

16 files changed

Lines changed: 476 additions & 39 deletions

File tree

agents/executor.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,18 @@ Before reporting a task complete:
211211
- A task is not complete because code was written. It is complete when the intended verification path actually passes.
212212

213213
### UI Proof Execution
214-
If the plan defines UI proof slots, record observed proof against the exact claim, route/state, observation, evidence kind, artifact path or manual step, privacy metadata, result, and claim limit before claiming task completion. Artifact metadata must include `visibility`, `retention`, `sensitivity`, and `safe_to_publish`; raw screenshots, traces, videos, DOM snapshots, and reports are local-only/unsafe by default and cannot back public, tracked, delivery, release, or publication proof claims. Use `gsdd ui-proof validate <path>` or `gsdd health` when a bundle exists. Artifact count, source comments, AST/cAST findings, semantic search, and Semble-like retrieval are not proof. Missing or weakly linked evidence must be recorded as proof debt, waiver, deferment, or reduced claim language rather than satisfied proof.
214+
If the plan defines UI proof slots, record observed proof against the exact claim, route/state, observation, evidence kind, artifact path or manual step, privacy metadata, result, and claim limit before claiming task completion.
215+
216+
Use `agent-browser` as the default live UI proof path:
217+
- open the planned route/state
218+
- capture interactive snapshots/refs when relevant
219+
- exercise the changed flow
220+
- capture screenshots for the planned viewport(s)
221+
- record relevant console/network observations
222+
223+
If `agent-browser` is unavailable, record the availability constraint and closest project-native interactive browser fallback in the proof bundle instead of silently treating the fallback as the default path. Existing Playwright/package-script browser tests remain canonical repeatable regression evidence when present; use Playwright scripting only for checks `agent-browser` cannot cover cleanly, such as JS-disabled, structured console, or multi-context verification.
224+
225+
Artifact metadata must include `visibility`, `retention`, `sensitivity`, and `safe_to_publish`; raw screenshots, traces, videos, DOM snapshots, and reports are local-only/unsafe by default and cannot back public, tracked, delivery, release, or publication proof claims. Use `gsdd ui-proof validate <path>` or `gsdd health` when a bundle exists. Artifact count, source comments, AST/cAST findings, semantic search, and Semble-like retrieval are not proof. Missing or weakly linked evidence must be recorded as proof debt, waiver, deferment, or reduced claim language rather than satisfied proof.
215226
</execution_loop>
216227

217228
<checkpoint_protocol>

agents/planner.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,8 @@ For UI-sensitive work, plan proof slots that can later be matched exactly to cla
158158

159159
Require observed artifacts to carry `visibility`, `retention`, `sensitivity`, and `safe_to_publish`; when a planned slot is meant to support public, publication, tracked, delivery, or release proof, say to validate the observed bundle with `gsdd ui-proof validate <path> --claim <...>`. `gsdd ui-proof validate`/`gsdd health` must catch invalid bundle metadata when present.
160160

161+
For live rendered UI proof, plan `agent-browser` as the default runtime evidence path: route open, interactive snapshot/refs when relevant, changed-flow interaction, screenshots for the planned viewport(s), and relevant console/network observations. If `agent-browser` is unavailable in the runtime, require an explicit availability constraint and the closest project-native interactive browser fallback before narrowing the claim. Existing Playwright/package-script browser tests remain the canonical repeatable regression path when present; do not scaffold new browser infrastructure by default. The planner chooses viewport coverage, but must explain why the viewport set is sufficient for the claim or narrow the claim limit; responsive claims need desktop/mobile or equivalent state coverage.
162+
161163
Do not let source annotations, AST/cAST findings, semantic search, comments, or Semble-like retrieval satisfy proof slots; they are discovery hints only. Human acceptance can narrow or waive a claim and record proof debt, but it must not turn missing or mismatched non-human evidence into `satisfied` proof.
162164
</ui_proof_planning>
163165

agents/verifier.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ Do not return a flat symptom list when the same underlying breakage explains mul
123123

124124
Visual correctness, live interaction quality, and some external integrations still need explicit human checks.
125125

126-
For UI proof slots, fail closed unless observed proof is matched to the exact claim, route/state, observation, evidence kind, artifact path or manual step, privacy metadata, result, and claim limit. Artifact metadata must include `visibility`, `retention`, `sensitivity`, and `safe_to_publish`; local-only or unsafe artifacts cannot back public, tracked, delivery, release, or publication proof claims, and `gsdd ui-proof validate`/`gsdd health` metadata failures block the stronger proof claim. Screenshots, traces, reports, Gherkin, a11y scans, E2E outputs, manual notes, source annotations, AST/cAST findings, semantic search, comments, and Semble-like retrieval do not satisfy proof by existence alone. Human acceptance records risk, waiver, deferment, proof debt, or a narrowed claim; it does not upgrade missing or mismatched non-human proof to `satisfied`.
126+
For UI proof slots, fail closed unless observed proof is matched to the exact claim, route/state, observation, evidence kind, artifact path or manual step, privacy metadata, result, and claim limit. For live UI runtime proof, expect `agent-browser` as the default captured tool unless the observed bundle explains a project-native equivalent or an availability constraint; do not fail solely because another browser tool was used, but downgrade vague proof that lacks exact route/state, viewport coverage or rationale, interactive steps/refs where relevant, screenshot/report artifacts, or relevant console/network observations. Existing Playwright/package-script browser tests count as canonical repeatable regression evidence, not as a replacement for scoped runtime proof when the slot requires `runtime`. Artifact metadata must include `visibility`, `retention`, `sensitivity`, and `safe_to_publish`; local-only or unsafe artifacts cannot back public, tracked, delivery, release, or publication proof claims, and `gsdd ui-proof validate`/`gsdd health` metadata failures block the stronger proof claim. Screenshots, traces, reports, Gherkin, a11y scans, E2E outputs, manual notes, source annotations, AST/cAST findings, semantic search, comments, and Semble-like retrieval do not satisfy proof by existence alone. Human acceptance records risk, waiver, deferment, proof debt, or a narrowed claim; it does not upgrade missing or mismatched non-human proof to `satisfied`.
127127

128128
## Step 9: Determine overall status
129129

bin/lib/health.mjs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -147,13 +147,13 @@ export function createCmdHealth(ctx) {
147147
const parsed = readUiProofBundleFile(bundlePath);
148148
const validation = parsed.errors.length > 0
149149
? { valid: false, errors: parsed.errors }
150-
: validateUiProofBundle(parsed.bundle);
150+
: validateUiProofBundle(parsed.bundle, { requireLocalArtifactExists: true, workspaceRoot: cwd });
151151
if (!validation.valid) {
152152
errors.push({
153153
id: 'E10',
154154
severity: 'ERROR',
155155
message: `${relativePath} has invalid UI proof metadata (${validation.errors.map((entry) => entry.code).join(', ')})`,
156-
fix: 'Run `gsdd ui-proof validate <path>` and add required privacy metadata, claim limits, fixed evidence kinds, observation artifact references, and safe-to-publish handling.',
156+
fix: 'Run `gsdd ui-proof validate <path>` and add required privacy metadata, claim limits, fixed evidence kinds, concise tool provenance, failure classification when failed or partial, observation artifact references, existing local artifact paths, and safe-to-publish handling.',
157157
});
158158
}
159159
}

0 commit comments

Comments
 (0)