feat(supervisor): operator supervisor surface with signed escalation receipts by chernistry · Pull Request #1805 · sipyourdrink-ltd/bernstein

chernistry · 2026-05-21T19:57:47Z

Closes #1800.

Summary

Adds bernstein supervisor status [--json] and bernstein supervisor escalate <session_id> --reason "..." as an operator-facing surface over the existing stalled_manager, watchdog, and spawn_supervisor detectors. The detectors remain the source of truth; the new command aggregates and renders.
Each escalation appends a signed receipt (Ed25519) to the audit chain with (worker_id, worktree_id, last_n_audit_entries, identity_tokens, stall_reason, recommended_action, prev_chain_digest). Receipts verify offline against the install's public key.
recommended_action is a pure function of the chain slice at stall time. Two operators verifying the same receipt land on the byte-identical recommendation. A cross-worktree fence refuses receipt assembly when the stuck session leaked into a sibling worktree's resolution events.
bernstein status and bernstein fleet gain a stuck-count + oldest-stall summary line. The TUI dashboard adds a SupervisorPane highlighting stalled / parked / no-progress sessions with the same recommended action.

Files touched

src/bernstein/core/orchestration/supervisor_receipt.py (new) - receipt assembly, signing, verification, deterministic recommended_action, cross-worktree fence.
src/bernstein/core/orchestration/supervisor_aggregator.py (new) - reads .sdd/runtime/agents.json, heartbeats, and failure records into a single snapshot.
src/bernstein/cli/commands/supervisor_cmd.py (new) - bernstein supervisor status / escalate Click group.
src/bernstein/cli/main.py - register the new group.
src/bernstein/cli/commands/status_cmd.py - attach supervisor summary to bernstein status.
src/bernstein/cli/commands/fleet_cmd.py - append supervisor summary line to the fallback fleet renderer.
src/bernstein/core/lifecycle/hooks.py - add worker.escalated lifecycle event for external notifiers.
src/bernstein/tui/status_bar.py - add SupervisorPane widget + render_supervisor_pane helper.
src/bernstein/tui/worker_badges.py - add WorkerStatus.STUCK + status_for_supervisor_row helper.
docs/api/supervisor.md (new) - JSON schema for aggregator snapshot + receipt envelope, plus the determinism contract and cross-worktree fence rules.
tests/unit/test_supervisor_receipt.py (new) - 21 tests covering receipt assembly, signature verification, recommended-action determinism (drives the same chain slice from two temp dirs), fence assertion, dict roundtrip.
tests/integration/test_supervisor_chain_roundtrip.py (new) - end-to-end: synthetic .sdd/runtime/ -> aggregator -> signed receipt -> JSON roundtrip -> verification.
tests/snapshot/test_supervisor_pane_snapshot.py (new) - TUI render snapshot for the new pane.

Acceptance criteria

Test plan

uv run pytest tests/unit/test_supervisor_receipt.py tests/integration/test_supervisor_chain_roundtrip.py tests/snapshot/test_supervisor_pane_snapshot.py - 25 passed.
uv run pytest tests/unit/ -k "supervisor or stalled or watchdog" - 127 passed, 45 skipped, no regressions.
uv run ruff check . - all checks passed.
uv run ruff format --check . - all files formatted.
uv run pyright on the three new modules - 0 errors.
bernstein supervisor --help and bernstein supervisor status --json smoke-tested.

Summary by CodeRabbit

New Features
- Added top-level "supervisor" CLI with status and escalate commands for inspecting and escalating stuck workers.
- status now includes a supervisor summary in JSON and a dim one-line supervisor summary in human output.
- TUI/status bar gains a Supervisor pane showing stalled workers and a new "stuck" worker badge/icon.
Tests
- Added unit and integration tests covering supervisor snapshots, escalation receipts, and signing/verification.

…receipts Adds `bernstein supervisor status` and `bernstein supervisor escalate` as an operator-facing surface over the existing stalled_manager, watchdog, and spawn_supervisor detectors. The detectors remain the source of truth; the new command aggregates and renders. Each escalation appends a signed receipt to the audit chain. The receipt carries the worker id, worktree id, last N audit entries, identity tokens, stall reason, recommended action, and the previous chain digest. The receipt verifies offline against the install's Ed25519 public key. The recommended_action field is a pure function of the chain slice at stall time (stall_reason + audit_entries + respawn_budget_remaining). Two operators verifying the same receipt arrive at the byte-identical recommendation. A cross-worktree fence assertion refuses receipt assembly when the stuck session leaked into a sibling worktree's resolution events. The aggregator surfaces a stuck-count + oldest-stall summary line on `bernstein status` and `bernstein fleet`. The TUI gains a SupervisorPane that highlights stalled / parked / no-progress sessions with the same recommended action. Worker badges learn a STUCK status so the dashboard's existing widgets stay consistent with the pane. Closes #1800. Files touched: - src/bernstein/core/orchestration/supervisor_receipt.py (new) - src/bernstein/core/orchestration/supervisor_aggregator.py (new) - src/bernstein/cli/commands/supervisor_cmd.py (new) - src/bernstein/cli/main.py - src/bernstein/cli/commands/status_cmd.py - src/bernstein/cli/commands/fleet_cmd.py - src/bernstein/core/lifecycle/hooks.py (worker.escalated event) - src/bernstein/tui/status_bar.py (SupervisorPane + helpers) - src/bernstein/tui/worker_badges.py (STUCK status) - docs/api/supervisor.md (new) - tests/unit/test_supervisor_receipt.py (new) - tests/integration/test_supervisor_chain_roundtrip.py (new) - tests/snapshot/test_supervisor_pane_snapshot.py (new)

sourcery-ai

Sorry @chernistry, you have reached your weekly rate limit of 2500000 diff characters.

Please try again later or upgrade to continue using Sourcery

github-actions · 2026-05-21T19:58:01Z

Sonar insights (advisory, no merge-block)

Snapshot of bernstein on the configured Sonar instance:

Metric	Value
Coverage	13.5
Code smells	125
Bugs	11
Vulnerabilities	2
Security hotspots	87

Run bernstein doctor sonar locally for the full surface.

This comment is a soft signal. The Sonar scan runs on push to main; the PR check itself never fails on smells.

coderabbitai · 2026-05-21T19:58:03Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6ba61cfa-0e33-457b-b820-365f977a7686

📥 Commits

Reviewing files that changed from the base of the PR and between 7cfdcaa and f1d3a21.

📒 Files selected for processing (4)

src/bernstein/cli/commands/fleet_cmd.py
src/bernstein/cli/commands/status_cmd.py
src/bernstein/cli/commands/supervisor_cmd.py
src/bernstein/tui/status_bar.py

📝 Walkthrough

Walkthrough

Introduces a complete supervisor subsystem for detecting and escalating stalled workers. Adds a pure-read aggregator that classifies worker stall state from runtime files, deterministic signed escalation receipts anchored to the audit chain with cross-worktree validation, operator CLI commands for supervisor status and escalate workflows, lifecycle event type, and TUI integration including a supervisor pane and stuck worker status badges. All new code includes defensive error handling to prevent failures during snapshot or status rendering.

Changes

Supervisor Detection, Escalation, and Audit Integration

Layer / File(s)	Summary
Supervisor aggregator: snapshot detection and classification `src/bernstein/core/orchestration/supervisor_aggregator.py`	Loads agents, heartbeats, parked sessions, and recent failures from the runtime tree. Classifies stall state via heartbeat age, parked markers, and role/status signals. Returns sorted `SupervisorSnapshot` with stuck-worker count and deterministic `recommended_action` per worker and provides `snapshot_to_dict` and `format_summary_line`.
Escalation receipt signing and cryptographic verification `src/bernstein/core/orchestration/supervisor_receipt.py`	Defines `EscalationReceipt`, canonical serialization, cross-worktree fence enforcement, deterministic `recommend_action`, `assemble_receipt`, `sign_receipt`, and `verify_receipt` with structured verification results.
Supervisor CLI command group: status and escalate workflows `src/bernstein/cli/commands/supervisor_cmd.py`	`bernstein supervisor status` snapshots and renders stuck workers (Rich table or JSON). `bernstein supervisor escalate <session_id> --reason ...` validates session, resolves/generates Ed25519 signing key (env-overridable), appends an operator escalation entry to the audit slice, assembles and signs a receipt, persists it under supervisor receipts, appends a non-blocking `supervisor.escalated` audit entry, and emits a non-blocking `worker.escalated` lifecycle event.
Main CLI registration and lifecycle event type `src/bernstein/cli/main.py`, `src/bernstein/core/lifecycle/hooks.py`	Registers `supervisor` command group with the root CLI and adds `LifecycleEvent.WORKER_ESCALATED = "worker.escalated"` with documented context fields (`reason`, `receipt_path`, `recommended_action`, `stall_reason`, `worker_id`).
Supervisor summary integration into status and fleet commands `src/bernstein/cli/commands/status_cmd.py`, `src/bernstein/cli/commands/fleet_cmd.py`	`bernstein status` JSON output gains a `supervisor` field and prints a dim one-line supervisor summary in human mode; `bernstein fleet` fallback render prints the dim supervisor summary line. Both use best-effort helpers that return empty results on errors.
TUI supervisor pane and stuck worker status badge `src/bernstein/tui/status_bar.py`, `src/bernstein/tui/worker_badges.py`	Adds `SupervisorPaneRow` dataclass and `render_supervisor_pane` to format stuck workers deterministically (sorted, escaped fields, heartbeat age). Adds `SupervisorPane` widget with `refresh_rows`. Introduces `WorkerStatus.STUCK` with yellow "!" icon and `status_for_supervisor_row` helper.
Supervisor receipt unit tests `tests/unit/test_supervisor_receipt.py`	Unit tests cover `recommend_action` determinism, cross-worktree fence (accept/reject/ignore), receipt assembly and audit-window trimming, signing/verification roundtrips, tamper detection, canonical-byte stability, and unsigned receipt rejection.
Snapshot tests for TUI pane and end-to-end receipt roundtrip tests `tests/snapshot/test_supervisor_pane_snapshot.py`, `tests/snapshot/__snapshots__/test_supervisor_pane_snapshot.ambr`, `tests/integration/test_supervisor_chain_roundtrip.py`	Snapshot tests assert deterministic TUI pane rendering (empty when healthy, sorted stuck rows). Integration tests seed runtime tree artifacts, compute aggregator snapshots, assemble and sign receipts, verify signatures, and assert receipt `recommended_action` matches aggregator recommendation.
README API coverage allowlist update `tests/unit/test_readme_api_coverage.py`	Adds `"supervisor"` to `DOCUMENTED_COMMANDS` so README API coverage marks the new command documented.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

sipyourdrink-ltd/bernstein#1683: Introduces parked-session and respawn-budget concepts that the supervisor aggregator and receipt logic consume to classify stall state and compute recommended actions.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main addition: a new supervisor command surface with signed escalation receipts, which aligns with the primary changes across multiple files.
Description check	✅ Passed	The description is comprehensive, covering objectives, files touched, acceptance criteria, and test plan. It follows the template structure with clear sections explaining What, Why, and implementation approach.
Linked Issues check	✅ Passed	The PR fully implements the objectives from issue `#1800`: operator CLI commands, signed escalation receipts, deterministic recommended_action, cross-worktree fence, audit chain integration, TUI surfaces, and comprehensive tests.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to issue `#1800`. Core modules (supervisor_receipt, supervisor_aggregator), CLI registration, TUI surfaces, and lifecycle hooks are all necessary for the supervisor surface feature.
Docstring Coverage	✅ Passed	Docstring coverage is 90.70% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/1800-supervisor-escalation-receipts

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-21T19:58:15Z

Review-bot acknowledgement summary

Must-address findings: 7 (7 acknowledged, 0 open)
Informational findings: 8

All must-address findings are resolved or acknowledged.

github-actions · 2026-05-21T19:58:24Z

bernstein doctor observe for PR #1805 (feat/1800-supervisor-escalation-receipts): ok=0, warn=2, fail=0, error=0, skipped=2

sonar -- WARN (project bernstein)

metric	value	delta	threshold	status
coverage_pct	13.5%	new	80.0%	fail
code_smells	125	new	50	warn
bugs	11	new	0	fail
vulnerabilities	2	new	0	warn
security_hotspots	87	new	0	fail

code-scanning -- WARN (17 open alert(s))

metric	value	delta	threshold	status
open_alerts	17	new	0	fail
critical_alerts	0	new	0	ok
high_alerts	2	new	0	warn
medium_alerts	0	new	-	ok
low_alerts	0	new	-	ok

Skipped backends (credentials not configured)

glitchtip: BERNSTEIN_GLITCHTIP_TOKEN not set
dt: DTRACK_URL/TOKEN/PROJECT not set

See docs/observability/unified-doctor.md for backend setup notes.

Auto-applied by contract-drift-autofix.yml on PR #1805. Regenerated via scripts/regen_contract_drift.py. Refs #1273. Source CI run: https://github.com/sipyourdrink-ltd/bernstein/actions/runs/26249756348

coderabbitai

Actionable comments posted: 15

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/bernstein/cli/commands/fleet_cmd.py`:
- Around line 163-164: The bare except in the fleet renderer that currently does
"except Exception: return """ hides real errors; update the except block in the
fleet supervisor-summary code to either catch specific expected exceptions or at
minimum log the exception details before returning, e.g. use the module's logger
(or process_logger) and call logger.exception or logger.error(...,
exc_info=True) with a clear message about "fleet supervisor-summary" so failures
are operator-visible while preserving the existing fallback return value.

In `@src/bernstein/cli/commands/status_cmd.py`:
- Around line 227-228: The broad except blocks in the supervisor-summary helpers
in status_cmd.py currently swallow all exceptions and return {} — update those
handlers to either catch specific expected exceptions (e.g., ValueError,
KeyError, RuntimeError) or, if you must keep a generic Exception fallback, log
the error before returning the empty dict; use the module logger (or
processLogger) and logger.exception(...) or logger.error(..., exc_info=True) in
the same except blocks so failures remain diagnosable while preserving the
fallback behavior.

In `@src/bernstein/cli/commands/supervisor_cmd.py`:
- Around line 169-175: The command accepts whitespace-only reasons and proceeds
to mutate state; update supervisor_escalate to validate reason by trimming
whitespace (reason.strip()) at the very start and abort before any
key/receipt/audit side effects if empty; raise a click.BadParameter or
click.ClickException (so the CLI exits cleanly with an error) and mirror the
same validation for the related handler code around lines 182-193 that also
consumes the reason string to ensure no state mutation happens with an
empty-trimmed reason.
- Around line 342-345: The try/except around get_install_rev() silently swallows
all Exceptions and returns "" which hides failures; replace the bare except with
handling that logs the error and preserves helpful behavior—catch the specific
expected exception types if known (or use "except Exception as e:"), call the
module logger (e.g., logger.exception or logger.error with the exception) to
record the failure and context, then either re-raise for unexpected cases or
return "" only after logging; update the block surrounding get_install_rev() in
supervisor_cmd.py to log the exception (including the exception object) instead
of silently returning.
- Around line 54-58: Move the four constants SUPERVISOR_RECEIPT_DIR,
SUPERVISOR_AUDIT_DIR, INSTALL_SIGNING_KEY_ENV, and DEFAULT_INSTALL_SIGNING_KEY
out of the supervisor_cmd module into the shared defaults module
(core.defaults): create or add them in core.defaults with the same names and
values, export them there, then remove the inline definitions from
supervisor_cmd and import the constants with from core.defaults import
SUPERVISOR_RECEIPT_DIR, SUPERVISOR_AUDIT_DIR, INSTALL_SIGNING_KEY_ENV,
DEFAULT_INSTALL_SIGNING_KEY so supervisor_cmd uses the shared defaults.
- Around line 380-385: The current filename generation using fname =
f"{int(time.time())}-{receipt.session_id}-{receipt.payload_digest[:12]}.json"
can collide; change it to include a unique high-resolution or random component
(e.g., time.time_ns() or uuid.uuid4().hex) when building fname using
receipt.session_id and payload_digest to keep traceability, and write atomically
by writing to a temp file in dest_dir and then renaming to path to avoid
clobbering; update references to fname/path and keep using
receipt_to_dict(receipt) for the file contents.
- Around line 368-373: Currently the except block swallows all exceptions and
returns "0"*64 which can reset the chain anchor; instead, detect whether the
audit store actually exists and only return the zero-anchor when there is no
audit data (e.g., audit_dir missing/empty); if the audit exists but
AuditLog(audit_dir=audit_dir) raises, propagate or re-raise the exception (or
log and raise) so callers don't silently break chain continuity. Update the
block around AuditLog(audit_dir=audit_dir) to check existence of the audit
storage first, return getattr(log, "_prev_hmac", "0"*64) when present, and avoid
catching Exception broadly—only handle the non-existence case locally and let
other errors surface.

In `@src/bernstein/core/orchestration/supervisor_aggregator.py`:
- Around line 65-67: Move the inline constant AGGREGATOR_SCHEMA_VERSION out of
supervisor_aggregator.py and define it in the core/defaults.py module, then
import it into supervisor_aggregator.py; specifically, add
AGGREGATOR_SCHEMA_VERSION = "1.0.0" to core/defaults.py (alongside other
defaults) and replace the inline declaration in supervisor_aggregator.py with an
import from core.defaults (use the same symbol name AGGREGATOR_SCHEMA_VERSION
where referenced).
- Around line 120-140: load_agents_snapshot and the other exposed routines
return and propagate untyped dict[str, Any] objects; replace those raw dicts
with a frozen TypedDict (e.g., AgentRow: TypedDict with required fields) or a
frozen dataclass model and update the function signature to return
list[AgentRow] (or tuple[...] if immutable), cast only when validating input,
and construct/validate AgentRow instances before returning; update all
referenced functions/sections (the blocks at 143-161, 201-233, 415-442) to parse
JSON into the new TypedDict/dataclass types, add explicit type hints on all
public functions, and remove broad cast usages so the codebase passes strict
Pyright type checks.

In `@src/bernstein/core/orchestration/supervisor_receipt.py`:
- Around line 84-92: RECEIPT_SCHEMA_VERSION and DEFAULT_RECEIPT_AUDIT_WINDOW
(and the other constants introduced at lines ~315-323) should be moved out of
supervisor_receipt.py into the central constants module bernstein.core.defaults;
create or add them in defaults.py with the same names and values, remove their
definitions from supervisor_receipt.py, and update supervisor_receipt.py to
import those names from bernstein.core.defaults (e.g., from
bernstein.core.defaults import RECEIPT_SCHEMA_VERSION,
DEFAULT_RECEIPT_AUDIT_WINDOW) so the module uses the centralized defaults
instead of local constants.
- Around line 213-220: Replace the raw dict-based contract surface by
introducing explicit TypedDict models for receipt-related shapes (e.g.,
AuditEntryTypedDict, ReceiptDetailsTypedDict, SupervisorReceiptTypedDict) and
use them instead of dict[str, Any] on the SupervisorReceipt fields
(audit_entries, details) and on all public functions that currently
accept/return ad-hoc dicts; update the dataclass fields (audit_entries:
tuple[AuditEntryTypedDict, ...], details: ReceiptDetailsTypedDict) and change
function signatures that accept payload/receipt dicts to use these TypedDict
types, ensure exported types are frozen/readonly where appropriate, add the new
imports/aliases, and run/typecheck until Pyright strict mode passes so all
public functions referenced in this file use the new TypedDict types instead of
raw dicts.

In `@src/bernstein/tui/status_bar.py`:
- Around line 500-505: The Rich markup string is built by interpolating
unescaped dynamic values (notably r.worker_id and r.role) which may contain '['
or ']' and will be parsed by Text.from_markup at Text.from_markup; wrap these
values with rich.markup.escape (import escape from rich.markup) before
interpolating into the f-string used in the status line (the block that builds
the string containing worker_id, role, reason, recommend, hb, budget). Escape
each user-controlled field (at least r.worker_id and r.role; consider
stall_reason and recommended_action) so Rich treats them as literal text rather
than markup.

In `@tests/integration/test_supervisor_chain_roundtrip.py`:
- Around line 42-88: Declare TypedDicts (e.g., RuntimeAgentDoc,
RuntimeHeartbeat, RuntimeFailure) and use them inside _seed_runtime_tree to type
the JSON payloads instead of raw dicts: change the types of agents_doc,
heartbeat, and failure to the new TypedDict types and construct their values to
conform to those schemas (including typing for stalled_manager and task_ids);
ensure imports (from typing import TypedDict, Any) are added and that json.dumps
still receives dict-compatible structures so file writes remain unchanged.

In `@tests/unit/test_supervisor_receipt.py`:
- Around line 40-48: Replace the raw dict return type of _audit_slice with a
TypedDict for type safety: define a TypedDict (e.g., AuditEntryFixture) that
includes event_type: str, session_id: str and optional details: dict[str, Any],
import TypedDict and Any, then change the _audit_slice signature to return
list[AuditEntryFixture] and construct the same dict literals (they will be
type-checked as AuditEntryFixture) so tests keep the same behavior while
following the "never raw dicts" guideline.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 51890e01-741d-4d8e-a3c1-919293721f75

📥 Commits

Reviewing files that changed from the base of the PR and between fe6a831 and 7cfdcaa.

⛔ Files ignored due to path filters (1)

docs/api/supervisor.md is excluded by !docs/api/**

📒 Files selected for processing (14)

src/bernstein/cli/commands/fleet_cmd.py
src/bernstein/cli/commands/status_cmd.py
src/bernstein/cli/commands/supervisor_cmd.py
src/bernstein/cli/main.py
src/bernstein/core/lifecycle/hooks.py
src/bernstein/core/orchestration/supervisor_aggregator.py
src/bernstein/core/orchestration/supervisor_receipt.py
src/bernstein/tui/status_bar.py
src/bernstein/tui/worker_badges.py
tests/integration/test_supervisor_chain_roundtrip.py
tests/snapshot/__snapshots__/test_supervisor_pane_snapshot.ambr
tests/snapshot/test_supervisor_pane_snapshot.py
tests/unit/test_readme_api_coverage.py
tests/unit/test_supervisor_receipt.py

coderabbitai · 2026-05-21T20:08:33Z

+SUPERVISOR_RECEIPT_DIR = ".sdd/runtime/supervisor/receipts"
+SUPERVISOR_AUDIT_DIR = ".sdd/audit"
+INSTALL_SIGNING_KEY_ENV = "BERNSTEIN_SUPERVISOR_SIGNING_KEY"
+DEFAULT_INSTALL_SIGNING_KEY = ".sdd/runtime/supervisor/install.key"
+


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Move supervisor constants to core/defaults.py.

Line 54 through Line 58 introduces new constants in the command module instead of shared defaults.

As per coding guidelines: "New constants must go in core/defaults.py, not inline in other modules".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/bernstein/cli/commands/supervisor_cmd.py` around lines 54 - 58, Move the four constants SUPERVISOR_RECEIPT_DIR, SUPERVISOR_AUDIT_DIR, INSTALL_SIGNING_KEY_ENV, and DEFAULT_INSTALL_SIGNING_KEY out of the supervisor_cmd module into the shared defaults module (core.defaults): create or add them in core.defaults with the same names and values, export them there, then remove the inline definitions from supervisor_cmd and import the constants with from core.defaults import SUPERVISOR_RECEIPT_DIR, SUPERVISOR_AUDIT_DIR, INSTALL_SIGNING_KEY_ENV, DEFAULT_INSTALL_SIGNING_KEY so supervisor_cmd uses the shared defaults.

coderabbitai · 2026-05-21T20:08:33Z

+    audit_entries: list[dict[str, Any]] = [
+        {
+            "event_type": str(rec.get("kind", "")),
+            "session_id": session_id,
+            "details": rec,
+        }
+        for rec in failures
+    ]
+    # Append the operator escalation as the trailing entry so the receipt
+    # captures the operator's intent even when no prior diagnostic exists.
+    audit_entries.append(
+        {
+            "event_type": "supervisor.escalate",
+            "session_id": session_id,
+            "details": {"reason": reason},
+        }
+    )


🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Use TypedDict for CLI payload structures.

Line 210 and Line 268 build raw dict payloads that are exchanged across module and persistence boundaries. Define explicit TypedDict payload schemas to keep strict typing enforceable.

As per coding guidelines: "Use frozen dataclasses or TypedDict for all data structures in Python - never raw dicts" and "Type hints must be present on ALL public functions. Pyright strict mode must pass".

Also applies to: 268-275

coderabbitai · 2026-05-21T20:08:33Z

+#: Schema version embedded in every receipt. Bumped only on breaking changes.
+RECEIPT_SCHEMA_VERSION: str = "1.0.0"
+
+
+#: Default number of trailing audit entries captured in the receipt. The
+#: window is fixed so two receipts assembled from the same chain prefix
+#: are byte-identical regardless of how much audit history exists.
+DEFAULT_RECEIPT_AUDIT_WINDOW: int = 16
+


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Move new constants to core/defaults.py.

Line 85, Line 91, and Line 315 introduce new constants directly in this module. Centralizing these in bernstein.core.defaults keeps configuration ownership consistent and avoids drift.

As per coding guidelines: "New constants must go in core/defaults.py, not inline in other modules".

Also applies to: 315-323

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/bernstein/core/orchestration/supervisor_receipt.py` around lines 84 - 92, RECEIPT_SCHEMA_VERSION and DEFAULT_RECEIPT_AUDIT_WINDOW (and the other constants introduced at lines ~315-323) should be moved out of supervisor_receipt.py into the central constants module bernstein.core.defaults; create or add them in defaults.py with the same names and values, remove their definitions from supervisor_receipt.py, and update supervisor_receipt.py to import those names from bernstein.core.defaults (e.g., from bernstein.core.defaults import RECEIPT_SCHEMA_VERSION, DEFAULT_RECEIPT_AUDIT_WINDOW) so the module uses the centralized defaults instead of local constants.

coderabbitai · 2026-05-21T20:08:33Z

+    audit_entries: tuple[dict[str, Any], ...]
+    identity: IdentityTokens
+    prev_chain_digest: str
+    payload_digest: str
+    respawn_budget_remaining: int = 0
+    signature_b64: str = ""
+    details: dict[str, Any] = field(default_factory=dict[str, Any])
+


🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Replace raw dict-based receipt contracts with TypedDict models.

Lines 213-220 and the public function signatures at Line 268, Line 340, Line 467, and Line 555 expose ad-hoc dict[str, Any] payloads. This weakens strict typing on a cross-module contract surface.

As per coding guidelines: "Use frozen dataclasses or TypedDict for all data structures in Python - never raw dicts" and "Type hints must be present on ALL public functions. Pyright strict mode must pass".

Also applies to: 268-272, 340-343, 467-468, 555-567

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/bernstein/core/orchestration/supervisor_receipt.py` around lines 213 - 220, Replace the raw dict-based contract surface by introducing explicit TypedDict models for receipt-related shapes (e.g., AuditEntryTypedDict, ReceiptDetailsTypedDict, SupervisorReceiptTypedDict) and use them instead of dict[str, Any] on the SupervisorReceipt fields (audit_entries, details) and on all public functions that currently accept/return ad-hoc dicts; update the dataclass fields (audit_entries: tuple[AuditEntryTypedDict, ...], details: ReceiptDetailsTypedDict) and change function signatures that accept payload/receipt dicts to use these TypedDict types, ensure exported types are frozen/readonly where appropriate, add the new imports/aliases, and run/typecheck until Pyright strict mode passes so all public functions referenced in this file use the new TypedDict types instead of raw dicts.

coderabbitai · 2026-05-21T20:08:34Z

+def _seed_runtime_tree(workdir: Path, *, session_id: str, role: str = "manager") -> None:
+    """Populate a minimal ``.sdd/runtime/`` tree that mimics a stalled session."""
+    runtime = workdir / ".sdd" / "runtime"
+    (runtime / "heartbeats").mkdir(parents=True, exist_ok=True)
+    (runtime / "failures").mkdir(parents=True, exist_ok=True)
+    (runtime / "spawn_supervisor").mkdir(parents=True, exist_ok=True)
+
+    # agents.json: one live worker, role=manager, with a stalled diagnostic.
+    agents_doc = {
+        "agents": [
+            {
+                "id": session_id,
+                "role": role,
+                "status": "working",
+                "task_ids": ["t-1"],
+                "worker_id": "abc123def456",
+                "worktree_id": "wt-A",
+                "stalled_manager": {
+                    "kind": "stalled_manager",
+                    "session_id": session_id,
+                    "runtime_s": 120.0,
+                    "hook_event_count": 12,
+                    "detected_at": 1700000000.0,
+                },
+            }
+        ]
+    }
+    (runtime / "agents.json").write_text(json.dumps(agents_doc, sort_keys=True))
+
+    # Heartbeat - aged so the aggregator flags the session as stuck.
+    heartbeat = {
+        "timestamp": 1700000000.0 - 600.0,  # 10 min old
+        "phase": "implementing",
+        "progress_pct": 0,
+    }
+    (runtime / "heartbeats" / f"{session_id}.json").write_text(json.dumps(heartbeat, sort_keys=True))
+
+    # One failure record so the aggregator has something to feed into the
+    # receipt's audit_entries slice.
+    failure = {
+        "kind": "stalled_manager",
+        "session_id": session_id,
+        "runtime_s": 120.0,
+        "hook_event_count": 12,
+        "detected_at": 1700000000.0,
+    }
+    (runtime / "failures" / f"manager-stalled-{session_id}.json").write_text(json.dumps(failure, sort_keys=True))


🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Consider TypedDict for runtime tree JSON documents.

The _seed_runtime_tree helper constructs several raw dict structures (agents_doc, heartbeat, failure) that simulate the JSON documents written by upstream detectors. Defining TypedDict types for these structures would improve type safety and document the expected schema:

class RuntimeAgentDoc(TypedDict): id: str role: str status: str task_ids: list[str] worker_id: str worktree_id: str stalled_manager: dict[str, Any] # or further typed class RuntimeHeartbeat(TypedDict): timestamp: float phase: str progress_pct: int

As per coding guidelines: "Use frozen dataclasses or TypedDict for all data structures in Python - never raw dicts".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/test_supervisor_chain_roundtrip.py` around lines 42 - 88, Declare TypedDicts (e.g., RuntimeAgentDoc, RuntimeHeartbeat, RuntimeFailure) and use them inside _seed_runtime_tree to type the JSON payloads instead of raw dicts: change the types of agents_doc, heartbeat, and failure to the new TypedDict types and construct their values to conform to those schemas (including typing for stalled_manager and task_ids); ensure imports (from typing import TypedDict, Any) are added and that json.dumps still receives dict-compatible structures so file writes remain unchanged.

coderabbitai · 2026-05-21T20:08:34Z

+def _audit_slice(*, with_fatal: bool = False) -> list[dict[str, str]]:
+    base = [
+        {"event_type": "session.start", "session_id": "sess-1"},
+        {"event_type": "task.pre_spawn", "session_id": "sess-1"},
+        {"event_type": "heartbeat.tick", "session_id": "sess-1"},
+    ]
+    if with_fatal:
+        base.append({"event_type": "auth.denied", "session_id": "sess-1"})
+    return base


🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Consider TypedDict for audit entry test fixtures.

The guideline states "never raw dicts" even in test code. While these fixtures simulate JSON input from the runtime system, annotating _audit_slice() return type with a TypedDict would improve type safety without changing the dict literal syntax:

from typing import TypedDict class AuditEntryFixture(TypedDict, total=False): event_type: str session_id: str details: dict[str, Any] def _audit_slice(*, with_fatal: bool = False) -> list[AuditEntryFixture]: ...

As per coding guidelines: "Use frozen dataclasses or TypedDict for all data structures in Python - never raw dicts".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/test_supervisor_receipt.py` around lines 40 - 48, Replace the raw dict return type of _audit_slice with a TypedDict for type safety: define a TypedDict (e.g., AuditEntryFixture) that includes event_type: str, session_id: str and optional details: dict[str, Any], import TypedDict and Any, then change the _audit_slice signature to return list[AuditEntryFixture] and construct the same dict literals (they will be type-checked as AuditEntryFixture) so tests keep the same behavior while following the "never raw dicts" guideline.

- Validate `--reason` strips to non-empty before any state mutation in `bernstein supervisor escalate`. A whitespace-only reason previously passed through into the receipt body. - Log silent-broad-exception paths in the supervisor summary helpers (status + fleet) and in the install-fingerprint lookup so an empty summary surfaces a cause in the orchestrator log instead of looking identical to a healthy run. - Refuse to silently reset the audit chain anchor when the audit log directory exists but is unreadable. The previous fallback to the genesis sentinel would have let a fresh receipt skip the chain head and break the tamper-evidence guarantee. - Switch receipt filenames to nanosecond timestamps and open the file exclusively (``"x"``) so two escalations in the same second cannot silently overwrite each other. - Escape every dynamic field interpolated into the supervisor TUI pane's Rich markup; an upstream caller passing ``[``/``]`` in a worker id or role no longer corrupts the dashboard layout. bot-ack: 3284009273 bot-ack: 3284009276 bot-ack: 3284009287 bot-ack: 3284009299 bot-ack: 3284009302 bot-ack: 3284009305 bot-ack: 3284009328

+            hb = f"{int(row.last_heartbeat_age_s)}s"
+        stuck_label = "STUCK" if row.is_stuck else "ok"
+        table.add_row(
+            f"{row.worker_id}",


+    if not isinstance(payload_any, dict):
+        return []
+    payload = cast(dict[str, Any], payload_any)
+    raw = payload.get("agents")


+
+        # Audit slice for the deterministic recommended action: failures
+        # already classified for this session.
+        slice_entries: list[dict[str, Any]] = []


+    failures = 0
+    for entry in audit_entries:
+        event_type = str(entry.get("event_type", "")).lower()
+        if event_type.endswith(".failed") or event_type.endswith(".error") or event_type.endswith(".errored"):


+    failures = 0
+    for entry in audit_entries:
+        event_type = str(entry.get("event_type", "")).lower()
+        if event_type.endswith(".failed") or event_type.endswith(".error") or event_type.endswith(".errored"):


chernistry enabled auto-merge (squash) May 21, 2026 19:57

sourcery-ai Bot reviewed May 21, 2026

View reviewed changes

github-actions Bot added core cli docs tests size/xl labels May 21, 2026

chore(ci): regenerate contract drift allow-lists

7cfdcaa

Auto-applied by contract-drift-autofix.yml on PR #1805. Regenerated via scripts/regen_contract_drift.py. Refs #1273. Source CI run: https://github.com/sipyourdrink-ltd/bernstein/actions/runs/26249756348

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

chernistry merged commit da8e804 into main May 21, 2026
34 of 36 checks passed

chernistry deleted the feat/1800-supervisor-escalation-receipts branch May 21, 2026 20:21

github-advanced-security AI found potential problems May 21, 2026

View reviewed changes

Uh oh!

Uh oh!

Conversation

chernistry commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files touched

Acceptance criteria

Test plan

Summary by CodeRabbit

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Sonar insights (advisory, no merge-block)

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review-bot acknowledgement summary

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

sonar -- WARN (project bernstein)

code-scanning -- WARN (17 open alert(s))

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chernistry commented May 21, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading