Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ node_modules
.gitignore
tmp/
.planning/
.worktrees/
.gsd/
.bg-shell/
*.md
!package.json
.env
Expand Down
23 changes: 21 additions & 2 deletions deploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,25 @@ ENVIRONMENT="cae-kodiai"
APP_NAME="ca-kodiai"
ACR_NAME="kodiairegistry" # Must be globally unique, alphanumeric only
IDENTITY_NAME="id-kodiai"
BUILD_CONTEXT_DIR=$(mktemp -d)

cleanup_build_context() {
rm -rf "$BUILD_CONTEXT_DIR"
}
trap cleanup_build_context EXIT

prepare_build_context() {
mkdir -p "$BUILD_CONTEXT_DIR"
rm -rf "$BUILD_CONTEXT_DIR"/*

cp package.json bun.lock tsconfig.json Dockerfile Dockerfile.agent "$BUILD_CONTEXT_DIR"/
mkdir -p "$BUILD_CONTEXT_DIR/src"
cp -R src/. "$BUILD_CONTEXT_DIR/src/"

echo "==> Prepared minimal build context at $BUILD_CONTEXT_DIR"
}

prepare_build_context

# -- Validate required environment variables ----------------------------------
missing=()
Expand Down Expand Up @@ -174,7 +193,7 @@ APP_IMAGE_DIGEST=$(az acr build \
--registry "$ACR_NAME" \
--image kodiai:latest \
--no-logs \
. \
"$BUILD_CONTEXT_DIR" \
--query 'outputImages[0].digest' \
--output tsv)
APP_IMAGE="${ACR_NAME}.azurecr.io/kodiai@${APP_IMAGE_DIGEST}"
Expand Down Expand Up @@ -243,7 +262,7 @@ ACA_JOB_IMAGE_DIGEST=$(az acr build \
--image kodiai-agent:latest \
--file Dockerfile.agent \
--no-logs \
. \
"$BUILD_CONTEXT_DIR" \
--query 'outputImages[0].digest' \
--output tsv)

Expand Down
45 changes: 45 additions & 0 deletions docs/runbooks/review-requested-debug.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,51 @@ When review output is published or an approval is submitted, the handler emits a
- `prNumber`
- `reviewOutputKey`

## M050 Timeout-Truth Verifier Surfaces

Use the M048 verifier family directly when you need a machine-checkable answer for the repaired small-PR timeout class.

### Local deterministic timeout-truth proof
Comment on lines +112 to +116
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new runbook section header says “M050 Timeout-Truth Verifier Surfaces”, but the commands and narrative immediately below reference the “M048 verifier family” (verify:m048:s01/s02/s03). Rename the header to match the actual verifier suite to avoid operator confusion.

Copilot uses AI. Check for mistakes.

```sh
bun run verify:m048:s03 -- --json
```

Interpret these fields:
- `local.timeoutSurfaces.passed=true` means the timeout partial-review line and timeout `Review Details` block still agree on:
- analyzed files vs total changed files
- captured finding count
- retry state (`scheduled ...` vs `skipped ...`)
- Fixture names:
- `timeout-scheduled-retry` — timeout output stayed truthful when a reduced-scope retry is still eligible
- `timeout-retry-skipped` — timeout output stayed truthful when chronic-timeout suppression skips the retry

### Live single-run proof for one review output key

```sh
bun run verify:m048:s01 -- --review-output-key <review-output-key> --json
```

Interpret these fields:
- `outcome.class=success|timeout_partial|timeout|failure|unknown`
- `outcome.summary` explains whether visible partial output was published
- `evidence.phases` still shows where latency landed (`executor handoff`, `remote runtime`, `publication`)

### Baseline vs candidate timeout-class compare

```sh
bun run verify:m048:s02 -- \
--baseline-review-output-key <baseline-key> \
--candidate-review-output-key <candidate-key> \
--json
```

Interpret these fields:
- `comparison.timeoutClass.state=retired` is the desired repaired outcome
- `comparison.timeoutClass.state=persisted` means the candidate still landed in the old timeout class
- `comparison.timeoutClass.state=introduced` means the candidate regressed into the timeout class
- `status_code=m048_s02_timeout_class_persisted|m048_s02_timeout_class_regressed` is an operator-visible failure even if targeted latency deltas look better

## 3) Verify the explicit `@kodiai review` publish bridge

On a PR comment, `@kodiai review` is handled by the mention handler as an explicit review request. The executor still runs on `taskType=review.full`, but the mention handler owns the GitHub approval publish bridge and the publish-resolution logs.
Expand Down
18 changes: 11 additions & 7 deletions scripts/verify-m048-s01.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,8 @@ describe("verify-m048-s01", () => {
expect(report.success).toBe(true);
expect(report.status_code).toBe("m048_s01_ok");
expect(report.sourceAvailability.azureLogs).toBe("present");
expect(report.outcome.class).toBe("success");
expect(report.outcome.summary).toContain("published output");
expect(report.evidence?.totalDurationMs).toBe(4_250);
expect(report.evidence?.phases.map((phase: ReviewPhaseTiming) => phase.name)).toEqual([...REQUIRED_PHASES]);
});
Expand Down Expand Up @@ -211,13 +213,13 @@ describe("verify-m048-s01", () => {
queryLogs: async () => ({
query: "phase query",
rows: [makeRow({
conclusion: "timeout",
published: false,
conclusion: "timeout_partial",
published: true,
phases: makePhases({
publication: {
status: "unavailable",
detail: "review timed out before publication",
durationMs: undefined,
status: "completed",
detail: "partial review output published before timeout",
durationMs: 625,
},
}),
})],
Expand All @@ -227,8 +229,10 @@ describe("verify-m048-s01", () => {
const human = renderM048S01Report(report);

expect(human).toContain("Status: m048_s01_ok");
expect(human).toContain("Conclusion: timeout");
expect(human).toContain("publication: unavailable (review timed out before publication)");
expect(human).toContain("Outcome class: timeout_partial");
expect(human).toContain("Outcome detail: timeout_partial (visible partial output published)");
expect(human).toContain("Conclusion: timeout_partial");
expect(human).toContain("publication: 625ms");
});

test("package.json wires verify:m048:s01 to the verifier script", async () => {
Expand Down
67 changes: 66 additions & 1 deletion scripts/verify-m048-s01.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,15 @@ export type M048S01StatusCode =
| "m048_s01_correlation_mismatch"
| "m048_s01_invalid_phase_payload";

export type M048S01OutcomeClass = "success" | "timeout" | "timeout_partial" | "failure" | "unknown";

export type M048S01Outcome = {
class: M048S01OutcomeClass;
conclusion: string | null;
published: boolean | null;
summary: string;
};

export type M048S01Report = {
command: "verify:m048:s01";
generated_at: string;
Expand All @@ -43,6 +52,7 @@ export type M048S01Report = {
duplicateRowCount: number;
driftedRowCount: number;
};
outcome: M048S01Outcome;
evidence: PhaseTimingEvidence | null;
issues: string[];
};
Expand Down Expand Up @@ -103,6 +113,56 @@ function readOptionValue(args: string[], index: number): { value: string | null;
};
}

export function deriveM048S01Outcome(evidence: PhaseTimingEvidence | null | undefined): M048S01Outcome {
const conclusion = evidence?.conclusion ?? null;
const published = evidence?.published ?? null;

if (!evidence) {
return {
class: "unknown",
conclusion,
published,
summary: "no correlated phase evidence available",
};
}

if (conclusion === "timeout_partial" || (conclusion === "timeout" && published === true)) {
return {
class: "timeout_partial",
conclusion,
published,
summary: "timeout_partial (visible partial output published)",
};
}

if (conclusion === "timeout") {
return {
class: "timeout",
conclusion,
published,
summary: "timeout (no visible output published)",
};
}

if (conclusion === "success") {
return {
class: "success",
conclusion,
published,
summary: published === true ? "success (published output)" : "success (no published output)",
};
}

return {
class: conclusion ? "failure" : "unknown",
conclusion,
published,
summary: conclusion
? `${conclusion} (${published === true ? "published output" : published === false ? "no published output" : "publication unknown"})`
: "no correlated phase evidence available",
};
Comment on lines +116 to +163
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deriveM048S01Outcome() returns summary text “no correlated phase evidence available” when evidence exists but evidence.conclusion is null. Since PhaseTimingEvidence.conclusion is nullable, this case can happen without evidence being missing, and the message becomes misleading. Consider a dedicated branch for evidence && conclusion === null (and possibly published === null) with a summary like “phase evidence missing conclusion/publication fields”, and/or add an issue in buildPhaseTimingEvidence when those fields are absent.

Copilot uses AI. Check for mistakes.
}

export function parseVerifyM048S01Args(args: string[]): {
help?: boolean;
json?: boolean;
Expand Down Expand Up @@ -171,6 +231,8 @@ function createBaseReport(params: {
evidence?: PhaseTimingEvidence | null;
issues?: string[];
}): M048S01Report {
const evidence = params.evidence ?? null;

return {
command: "verify:m048:s01",
generated_at: params.generatedAt ?? new Date().toISOString(),
Expand All @@ -189,7 +251,8 @@ function createBaseReport(params: {
duplicateRowCount: params.duplicateRowCount ?? 0,
driftedRowCount: params.driftedRowCount ?? 0,
},
evidence: params.evidence ?? null,
outcome: deriveM048S01Outcome(evidence),
evidence,
issues: params.issues ?? [],
};
}
Expand Down Expand Up @@ -344,6 +407,8 @@ export function renderM048S01Report(report: M048S01Report): string {
`Delivery id: ${report.delivery_id ?? "unavailable"}`,
`Azure logs: ${report.sourceAvailability.azureLogs}`,
`Query: workspaces=${report.query.workspaceCount} matched_rows=${report.query.matchedRowCount} duplicates=${report.query.duplicateRowCount} drift=${report.query.driftedRowCount} timespan=${report.query.timespan}`,
`Outcome class: ${report.outcome.class}`,
`Outcome detail: ${report.outcome.summary}`,
];

if (report.evidence) {
Expand Down
Loading
Loading