feat(eval): ingest CI-failure post-mortems into regression cases#1808
Conversation
Add a CIFailurePostmortem dataclass and route it through a new _synthesize_eval_case dispatcher so the existing DLQ / post-mortem flow handles CI-failure post-mortems with no special-casing in the sync loop. A new scripts/scrape_ci_postmortems.py walks merged PRs from the last 30 days via the gh CLI. A PR qualifies as a post-mortem when its commit list shows a feature commit followed by two or more fix-up commits matching the FIXUP_SUBJECT_RE regex (fix(ci):, fix(tests):, fix(lint):, fix(types):, fixup!, squash!, plus the plain-prefix variants). Each qualifying PR becomes one JSON record under .sdd/reports/ci_postmortems/; the next sync pass promotes it to a P1 (warn-only) regression case keyed on ci-postmortem:<PR#>:<commit-sha>. Re-running either the scraper or the synthesizer is a no-op once a record or case exists on disk. When the gh CLI is missing or unauthenticated the scraper logs a notice and exits 0; integration tests skip rather than fail. Closes #1793
There was a problem hiding this comment.
Sorry @chernistry, you have reached your weekly rate limit of 2500000 diff characters.
Please try again later or upgrade to continue using Sourcery
Sonar insights (advisory, no merge-block)Snapshot of
Run This comment is a soft signal. The Sonar scan runs on push to |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR extends incident-to-eval synthesis to ingest CI-failure postmortems mined from merged PRs with fix-up commits. It adds a new ChangesCI-failure postmortem ingestion and synthesis
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Review-bot acknowledgement summary
All must-address findings are resolved or acknowledged. |
|
bernstein doctor observe for PR #1808 ( sonar -- WARN (project bernstein)
code-scanning -- WARN (14 open alert(s))
Skipped backends (credentials not configured)
See docs/observability/unified-doctor.md for backend setup notes. |
Contract drift detected - proposed patchInline autofix push failed ( Three contract tests act as drift detectors against the public CLI / API surface:
One or more failed on this PR. Files changed: How to applyEither run the regen script locally: uv run python scripts/regen_contract_drift.py --fixture all
git add -A && git commit -m "chore(ci): regenerate contract drift allow-lists"
git pushOr apply the patch directly: gh pr checkout 1808
git apply <<'PATCH'
diff --git a/tests/unit/test_readme_api_coverage.py b/tests/unit/test_readme_api_coverage.py
index f492845b..8151f2de 100644
--- a/tests/unit/test_readme_api_coverage.py
+++ b/tests/unit/test_readme_api_coverage.py
@@ -237,6 +237,8 @@ DOCUMENTED_COMMANDS: frozenset[str] = frozenset(
"interop",
# Bot-added: drift autofix (regen_contract_drift.py)
"desktop-register",
+ # Bot-added: drift autofix (regen_contract_drift.py)
+ "schedule",
}
)
PATCH
git add -A && git commit -m "chore(ci): regenerate contract drift allow-lists"
git pushFull diffdiff --git a/tests/unit/test_readme_api_coverage.py b/tests/unit/test_readme_api_coverage.py
index f492845b..8151f2de 100644
--- a/tests/unit/test_readme_api_coverage.py
+++ b/tests/unit/test_readme_api_coverage.py
@@ -237,6 +237,8 @@ DOCUMENTED_COMMANDS: frozenset[str] = frozenset(
"interop",
# Bot-added: drift autofix (regen_contract_drift.py)
"desktop-register",
+ # Bot-added: drift autofix (regen_contract_drift.py)
+ "schedule",
}
)Source CI run: https://github.com/sipyourdrink-ltd/bernstein/actions/runs/26250350092 Refs #1273. |
| for line in path.read_text(encoding="utf-8").splitlines(): | ||
| if line.startswith("source_incident:"): | ||
| raw = line.split(":", 1)[1].strip() | ||
| if len(raw) >= 2 and raw[0] == '"' and raw[-1] == '"': |
Closes #1793.
Summary
CIFailurePostmortemdataclass and route every incident shape (DLQ entry, post-mortem JSON, CI-failure post-mortem) through a single_synthesize_eval_casedispatcher insrc/bernstein/eval/incident_synthesizer.py.scripts/scrape_ci_postmortems.pythat walks merged PRs from the last 30 days via theghCLI, detects 2+ fix-up commit patterns, and emits one JSON record per qualifying PR under.sdd/reports/ci_postmortems/. The synthesizer then promotes each record into a P1 (warn-only) regression case keyed onci-postmortem:<PR#>:<commit-sha>.tests/unit/eval/test_incident_synthesizer.pycover the dataclass, the dispatcher, the JSON-record sync path, the fix-up regex, scraper qualification thresholds, idempotency, and dedup against existing YAML cases.Fix-up commit heuristic (operator-judgement decision)
The first commit in a PR is treated as the original feature commit and never counted as a fix-up. A trailing commit qualifies when its subject matches
FIXUP_SUBJECT_RE:fix(ci):,fix(tests):,fix(test):,fix(lint):,fix(types):,fix(typing):,fix(format):,fix(coverage):fixup!,!fixup,squash!fix ci:,fix tests:,fix lint:,fix typing:,fix types:A PR qualifies for ingestion when at least 2 trailing commits match (
MIN_FIXUP_COMMITS = 2).Files touched
src/bernstein/eval/incident_synthesizer.pyscripts/scrape_ci_postmortems.py(new)tests/unit/eval/test_incident_synthesizer.py(new)docs/eval/incident-synthesis.mdAcceptance criteria
CIFailurePostmortemdataclass withpr_number,commit_sha,failing_test,error_line,fixup_commitsfields_synthesize_eval_casedispatches on input type and produces a validIncidentEvalCasefrom aCIFailurePostmortemscripts/scrape_ci_postmortems.pywalks merged PRs from the last 30 days viagh, detects 2+ fix-up commits, emits one JSON record per detectionsource_incident: ci-postmortem:<PR#>:<commit-sha>andtier: P1(regression, warn-only by default)src/bernstein/eval/cases/incidents/. Key on(pr_number, commit_sha)ghunavailable graceful degradation - scraper exits 0 with noticeTest plan
uv run pytest tests/unit/eval/ -q --no-cov --timeout=120- 88 passeduv run pytest tests/unit/test_incident_synthesizer.py tests/unit/eval/ -q --no-cov --timeout=120- 136 passed (no regression on existing DLQ / post-mortem coverage)uv run ruff check . && uv run ruff format .- cleanuv run pyright src/bernstein/eval/incident_synthesizer.py- 15 errors, equal to pre-existing baseline (no net regression)Summary by CodeRabbit