Skip to content

feat(eval): ingest CI-failure post-mortems into regression cases#1808

Merged
chernistry merged 1 commit into
mainfrom
feat/1793-ci-postmortem-ingest
May 21, 2026
Merged

feat(eval): ingest CI-failure post-mortems into regression cases#1808
chernistry merged 1 commit into
mainfrom
feat/1793-ci-postmortem-ingest

Conversation

@chernistry

@chernistry chernistry commented May 21, 2026

Copy link
Copy Markdown
Collaborator

Closes #1793.

Summary

  • Add CIFailurePostmortem dataclass and route every incident shape (DLQ entry, post-mortem JSON, CI-failure post-mortem) through a single _synthesize_eval_case dispatcher in src/bernstein/eval/incident_synthesizer.py.
  • Add scripts/scrape_ci_postmortems.py that walks merged PRs from the last 30 days via the gh CLI, detects 2+ fix-up commit patterns, and emits one JSON record per qualifying PR under .sdd/reports/ci_postmortems/. The synthesizer then promotes each record into a P1 (warn-only) regression case keyed on ci-postmortem:<PR#>:<commit-sha>.
  • Tests in tests/unit/eval/test_incident_synthesizer.py cover the dataclass, the dispatcher, the JSON-record sync path, the fix-up regex, scraper qualification thresholds, idempotency, and dedup against existing YAML cases.

Fix-up commit heuristic (operator-judgement decision)

The first commit in a PR is treated as the original feature commit and never counted as a fix-up. A trailing commit qualifies when its subject matches FIXUP_SUBJECT_RE:

  • fix(ci):, fix(tests):, fix(test):, fix(lint):, fix(types):, fix(typing):, fix(format):, fix(coverage):
  • fixup!, !fixup, squash!
  • Plain prefix fix ci:, fix tests:, fix lint:, fix typing:, fix types:

A PR qualifies for ingestion when at least 2 trailing commits match (MIN_FIXUP_COMMITS = 2).

Files touched

  • src/bernstein/eval/incident_synthesizer.py
  • scripts/scrape_ci_postmortems.py (new)
  • tests/unit/eval/test_incident_synthesizer.py (new)
  • docs/eval/incident-synthesis.md

Acceptance criteria

  • New CIFailurePostmortem dataclass with pr_number, commit_sha, failing_test, error_line, fixup_commits fields
  • _synthesize_eval_case dispatches on input type and produces a valid IncidentEvalCase from a CIFailurePostmortem
  • New scripts/scrape_ci_postmortems.py walks merged PRs from the last 30 days via gh, detects 2+ fix-up commits, emits one JSON record per detection
  • Emitted YAML eval-case carries source_incident: ci-postmortem:<PR#>:<commit-sha> and tier: P1 (regression, warn-only by default)
  • Dedup: re-running the scraper does not duplicate cases already in src/bernstein/eval/cases/incidents/. Key on (pr_number, commit_sha)
  • Tests: one fix-up commit becomes one case (positive); multi-commit pattern behaviour documented and tested; re-run idempotency; empty input produces zero cases
  • gh unavailable graceful degradation - scraper exits 0 with notice

Test plan

  • uv run pytest tests/unit/eval/ -q --no-cov --timeout=120 - 88 passed
  • uv run pytest tests/unit/test_incident_synthesizer.py tests/unit/eval/ -q --no-cov --timeout=120 - 136 passed (no regression on existing DLQ / post-mortem coverage)
  • uv run ruff check . && uv run ruff format . - clean
  • uv run pyright src/bernstein/eval/incident_synthesizer.py - 15 errors, equal to pre-existing baseline (no net regression)
  • CI green on PR

Summary by CodeRabbit

  • New Features
    • Incident synthesis now ingests CI-failure postmortems from merged pull requests as a fourth incident source alongside dead-letter queues, orchestrator postmortems, and flaky-test detection.
    • Automatically extracts minimal reproducible triggers from CI failures and promotes them as eval gates on future CI runs.
    • Adds idempotent scraping and deduplication to prevent duplicate incident processing.

Review Change Stack

Add a CIFailurePostmortem dataclass and route it through a new
_synthesize_eval_case dispatcher so the existing DLQ / post-mortem
flow handles CI-failure post-mortems with no special-casing in the
sync loop.

A new scripts/scrape_ci_postmortems.py walks merged PRs from the
last 30 days via the gh CLI. A PR qualifies as a post-mortem when
its commit list shows a feature commit followed by two or more
fix-up commits matching the FIXUP_SUBJECT_RE regex (fix(ci):,
fix(tests):, fix(lint):, fix(types):, fixup!, squash!, plus the
plain-prefix variants). Each qualifying PR becomes one JSON
record under .sdd/reports/ci_postmortems/; the next sync pass
promotes it to a P1 (warn-only) regression case keyed on
ci-postmortem:<PR#>:<commit-sha>. Re-running either the scraper or
the synthesizer is a no-op once a record or case exists on disk.

When the gh CLI is missing or unauthenticated the scraper logs a
notice and exits 0; integration tests skip rather than fail.

Closes #1793

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @chernistry, you have reached your weekly rate limit of 2500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@chernistry chernistry enabled auto-merge (squash) May 21, 2026 20:09
@github-actions

Copy link
Copy Markdown
Contributor

Sonar insights (advisory, no merge-block)

Snapshot of bernstein on the configured Sonar instance:

Metric Value
Coverage 13.5
Code smells 125
Bugs 11
Vulnerabilities 2
Security hotspots 87

Run bernstein doctor sonar locally for the full surface.

This comment is a soft signal. The Sonar scan runs on push to main; the PR check itself never fails on smells.

@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e8b1bef1-1f85-4db4-9e29-b6446b65a16c

📥 Commits

Reviewing files that changed from the base of the PR and between 591b00d and 3e54ba4.

📒 Files selected for processing (4)
  • docs/eval/incident-synthesis.md
  • scripts/scrape_ci_postmortems.py
  • src/bernstein/eval/incident_synthesizer.py
  • tests/unit/eval/test_incident_synthesizer.py

📝 Walkthrough

Walkthrough

This PR extends incident-to-eval synthesis to ingest CI-failure postmortems mined from merged PRs with fix-up commits. It adds a new CIFailurePostmortem dataclass, a standalone scraper (scripts/scrape_ci_postmortems.py) that detects fix-up commit patterns via the gh CLI, and integrates CI record ingestion into IncidentSynthesizer with idempotent deduplication and P1 regression case synthesis.

Changes

CI-failure postmortem ingestion and synthesis

Layer / File(s) Summary
CIFailurePostmortem schema and dispatch path
src/bernstein/eval/incident_synthesizer.py
CIFailurePostmortem dataclass defines the CI record schema (PR number, commit SHA, failing check, error line, fix-up subjects); adds synthesize_from_ci_postmortem() public method and refactors synthesis through unified _synthesize_eval_case() dispatcher handling DLQEntry, CIFailurePostmortem, and raw post-mortem dicts.
Scraper CLI: PR detection and record emission
scripts/scrape_ci_postmortems.py
Discovers merged PRs via gh, detects fix-up commit patterns matching FIXUP_SUBJECT_RE (minimum 2 commits), synthesizes JSON records with failing-test heuristics via path/keyword analysis, enforces idempotency via dedup keys from existing JSON and YAML source_incident fields, supports dry-run mode and graceful gh availability degradation.
Synthesizer CI ingestion and deduplication
src/bernstein/eval/incident_synthesizer.py
Extends sync() to load both case IDs and prior source_incident keys from disk for richer deduplication; adds _iter_ci_postmortem_cases() reader for .sdd/reports/ci_postmortems/*.json records; implements _case_from_ci_postmortem() builder creating P1 regression cases with CI-specific prompt; replaces _load_existing_ids() with _load_existing_state() returning dual dedup axes.
Comprehensive test coverage: scraper and synthesizer
tests/unit/eval/test_incident_synthesizer.py
Validates CIFailurePostmortem dataclass defaults, frozen/hashable properties; tests synthesizer dispatch creating P1 cases with correct tags and source_incident format; exercises scraper heuristics (fix-up regex, failing-test guessing, dedup logic); validates sync() ingestion and idempotent re-runs; includes end-to-end integration and graceful gh unavailability handling.
Documentation: feature description and usage
docs/eval/incident-synthesis.md
Describes CI-failure postmortem ingestion as a fourth incident source; expands "The synthesiser" section to detail multi-stage pipeline; adds dedicated "CI-failure postmortems" subsection covering PR qualification, JSON naming, P1 promotion, idempotent dedup, cron usage, and gh unavailability behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

size/l, core, docs, tests

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/1793-ci-postmortem-ingest

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

Review-bot acknowledgement summary

  • Must-address findings: 0 (0 acknowledged, 0 open)
  • Informational findings: 0

All must-address findings are resolved or acknowledged.

@github-actions

Copy link
Copy Markdown
Contributor

bernstein doctor observe for PR #1808 (feat/1793-ci-postmortem-ingest): ok=0, warn=2, fail=0, error=0, skipped=2

sonar -- WARN (project bernstein)

metric value delta threshold status
coverage_pct 13.5% new 80.0% fail
code_smells 125 new 50 warn
bugs 11 new 0 fail
vulnerabilities 2 new 0 warn
security_hotspots 87 new 0 fail

code-scanning -- WARN (14 open alert(s))

metric value delta threshold status
open_alerts 14 new 0 fail
critical_alerts 0 new 0 ok
high_alerts 2 new 0 warn
medium_alerts 0 new - ok
low_alerts 0 new - ok
Skipped backends (credentials not configured)
  • glitchtip: BERNSTEIN_GLITCHTIP_TOKEN not set
  • dt: DTRACK_URL/TOKEN/PROJECT not set

See docs/observability/unified-doctor.md for backend setup notes.

@chernistry chernistry merged commit 4e824df into main May 21, 2026
60 of 63 checks passed
@chernistry chernistry deleted the feat/1793-ci-postmortem-ingest branch May 21, 2026 20:10
@github-actions

Copy link
Copy Markdown
Contributor

Contract drift detected - proposed patch

Inline autofix push failed (failure). Apply the patch below manually.

Three contract tests act as drift detectors against the public CLI / API surface:

  • tests/unit/test_readme_api_coverage.py::test_all_cli_commands_are_documented
  • tests/unit/test_api_v1_routing.py::TestVersionedRoutesParity::test_every_root_route_has_v1_counterpart
  • tests/unit/test_cli_run_params.py::test_run_params_match_cli_call

One or more failed on this PR. scripts/regen_contract_drift.py produced the patch below (2 LOC, cap: 30).

Files changed:

tests/unit/test_readme_api_coverage.py

How to apply

Either run the regen script locally:

uv run python scripts/regen_contract_drift.py --fixture all
git add -A && git commit -m "chore(ci): regenerate contract drift allow-lists"
git push

Or apply the patch directly:

gh pr checkout 1808
git apply <<'PATCH'
diff --git a/tests/unit/test_readme_api_coverage.py b/tests/unit/test_readme_api_coverage.py
index f492845b..8151f2de 100644
--- a/tests/unit/test_readme_api_coverage.py
+++ b/tests/unit/test_readme_api_coverage.py
@@ -237,6 +237,8 @@ DOCUMENTED_COMMANDS: frozenset[str] = frozenset(
         "interop",
         # Bot-added: drift autofix (regen_contract_drift.py)
         "desktop-register",
+        # Bot-added: drift autofix (regen_contract_drift.py)
+        "schedule",
     }
 )
PATCH
git add -A && git commit -m "chore(ci): regenerate contract drift allow-lists"
git push
Full diff
diff --git a/tests/unit/test_readme_api_coverage.py b/tests/unit/test_readme_api_coverage.py
index f492845b..8151f2de 100644
--- a/tests/unit/test_readme_api_coverage.py
+++ b/tests/unit/test_readme_api_coverage.py
@@ -237,6 +237,8 @@ DOCUMENTED_COMMANDS: frozenset[str] = frozenset(
         "interop",
         # Bot-added: drift autofix (regen_contract_drift.py)
         "desktop-register",
+        # Bot-added: drift autofix (regen_contract_drift.py)
+        "schedule",
     }
 )

Source CI run: https://github.com/sipyourdrink-ltd/bernstein/actions/runs/26250350092

Refs #1273.

for line in path.read_text(encoding="utf-8").splitlines():
if line.startswith("source_incident:"):
raw = line.split(":", 1)[1].strip()
if len(raw) >= 2 and raw[0] == '"' and raw[-1] == '"':
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

eval: ingest CI-failure post-mortems into regression cases

2 participants