feat: harden small PR timeout proof and truthful partial reviews#87
feat: harden small PR timeout proof and truthful partial reviews#87
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens the “small PR timeout” behavior by making partial-review and Review Details output derive analyzed progress, findings, retry state, and phase timing from checkpoint + GitHub-visible evidence, and extends the M048 verifier/runbook surfaces to prove those guarantees. It also reduces ACR remote-build context size by staging a minimal build context in deploy.sh.
Changes:
- Add timeout-aware “Review Details” rendering and publish it on timeout using checkpoint + extracted inline findings evidence.
- Extend
verify:m048:s01/s02/s03and the runbook with outcome/timeout-class proof and deterministic timeout-surface fixtures. - Harden ACR remote builds by using a minimal staged context; expand
.dockerignoreto exclude additional local worktree artifacts.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/lib/review-utils.ts | Adds timeoutProgress support to render timeout-truthful Review Details lines. |
| src/lib/review-utils.test.ts | Tests that timeout progress lines render and generic totals are suppressed. |
| src/handlers/review.ts | On timeout, derives analyzed progress from checkpoint + inline evidence, surfaces retry state, and publishes timeout Review Details accordingly. |
| src/handlers/review.test.ts | Adds an integration-style timeout publication test covering analyzed progress + retry scheduling. |
| scripts/verify-m048-s01.ts | Introduces outcome classification derived from phase evidence. |
| scripts/verify-m048-s01.test.ts | Updates tests to assert new outcome fields and timeout-partial publication interpretation. |
| scripts/verify-m048-s02.ts | Adds timeout-class comparison state and new failure status codes when timeouts persist/regress. |
| scripts/verify-m048-s02.test.ts | Expands compare tests to cover timeout-class retirement/persistence/regression. |
| scripts/verify-m048-s03.ts | Adds deterministic “timeout truth” fixture proofs and includes outcome in rendered live evidence output. |
| scripts/verify-m048-s03.test.ts | Asserts timeout-surface fixture pass/fail behavior and updated report structure. |
| docs/runbooks/review-requested-debug.md | Documents how to run M048 verifiers for timeout-truth proof and interpretation. |
| deploy.sh | Stages a minimal ACR build context (avoids sending large local workspace artifacts). |
| .dockerignore | Ignores additional local-workspace directories. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const buildReviewDetailsBody = (timeoutProgress?: { | ||
| analyzedFiles: number; | ||
| totalFiles: number; | ||
| findingCount: number; | ||
| retryState: string; | ||
| }): string => { |
There was a problem hiding this comment.
buildReviewDetailsBody redefines an inline timeoutProgress shape that duplicates the exported TimeoutReviewDetailsProgress type from src/lib/review-utils.ts. This risks drift between the handler and formatter if the fields change. Import and use TimeoutReviewDetailsProgress (or re-export a handler-local type) so the compiler keeps them in sync.
| ## M050 Timeout-Truth Verifier Surfaces | ||
|
|
||
| Use the M048 verifier family directly when you need a machine-checkable answer for the repaired small-PR timeout class. | ||
|
|
||
| ### Local deterministic timeout-truth proof |
There was a problem hiding this comment.
The new runbook section header says “M050 Timeout-Truth Verifier Surfaces”, but the commands and narrative immediately below reference the “M048 verifier family” (verify:m048:s01/s02/s03). Rename the header to match the actual verifier suite to avoid operator confusion.
| export function deriveM048S01Outcome(evidence: PhaseTimingEvidence | null | undefined): M048S01Outcome { | ||
| const conclusion = evidence?.conclusion ?? null; | ||
| const published = evidence?.published ?? null; | ||
|
|
||
| if (!evidence) { | ||
| return { | ||
| class: "unknown", | ||
| conclusion, | ||
| published, | ||
| summary: "no correlated phase evidence available", | ||
| }; | ||
| } | ||
|
|
||
| if (conclusion === "timeout_partial" || (conclusion === "timeout" && published === true)) { | ||
| return { | ||
| class: "timeout_partial", | ||
| conclusion, | ||
| published, | ||
| summary: "timeout_partial (visible partial output published)", | ||
| }; | ||
| } | ||
|
|
||
| if (conclusion === "timeout") { | ||
| return { | ||
| class: "timeout", | ||
| conclusion, | ||
| published, | ||
| summary: "timeout (no visible output published)", | ||
| }; | ||
| } | ||
|
|
||
| if (conclusion === "success") { | ||
| return { | ||
| class: "success", | ||
| conclusion, | ||
| published, | ||
| summary: published === true ? "success (published output)" : "success (no published output)", | ||
| }; | ||
| } | ||
|
|
||
| return { | ||
| class: conclusion ? "failure" : "unknown", | ||
| conclusion, | ||
| published, | ||
| summary: conclusion | ||
| ? `${conclusion} (${published === true ? "published output" : published === false ? "no published output" : "publication unknown"})` | ||
| : "no correlated phase evidence available", | ||
| }; |
There was a problem hiding this comment.
deriveM048S01Outcome() returns summary text “no correlated phase evidence available” when evidence exists but evidence.conclusion is null. Since PhaseTimingEvidence.conclusion is nullable, this case can happen without evidence being missing, and the message becomes misleading. Consider a dedicated branch for evidence && conclusion === null (and possibly published === null) with a summary like “phase evidence missing conclusion/publication fields”, and/or add an issue in buildPhaseTimingEvidence when those fields are absent.
Review Details
|
Summary
verify:m048:s01,verify:m048:s02, andverify:m048:s03plus the review-requested runbook with publication-aware timeout proofdeploy.sh/.dockerignoreso ACR remote builds use a minimal staged context instead of large local workspace artifactsTest Plan
bun test ./src/handlers/review.test.ts ./src/lib/review-utils.test.ts ./scripts/verify-m048-s01.test.ts ./scripts/verify-m048-s02.test.ts ./scripts/verify-m048-s03.test.ts && bun run tsc --noEmit