Skip to content

feat(supervisor): operator supervisor surface with signed escalation receipts#1805

Merged
chernistry merged 3 commits into
mainfrom
feat/1800-supervisor-escalation-receipts
May 21, 2026
Merged

feat(supervisor): operator supervisor surface with signed escalation receipts#1805
chernistry merged 3 commits into
mainfrom
feat/1800-supervisor-escalation-receipts

Conversation

@chernistry

@chernistry chernistry commented May 21, 2026

Copy link
Copy Markdown
Collaborator

Closes #1800.

Summary

  • Adds bernstein supervisor status [--json] and bernstein supervisor escalate <session_id> --reason "..." as an operator-facing surface over the existing stalled_manager, watchdog, and spawn_supervisor detectors. The detectors remain the source of truth; the new command aggregates and renders.
  • Each escalation appends a signed receipt (Ed25519) to the audit chain with (worker_id, worktree_id, last_n_audit_entries, identity_tokens, stall_reason, recommended_action, prev_chain_digest). Receipts verify offline against the install's public key.
  • recommended_action is a pure function of the chain slice at stall time. Two operators verifying the same receipt land on the byte-identical recommendation. A cross-worktree fence refuses receipt assembly when the stuck session leaked into a sibling worktree's resolution events.
  • bernstein status and bernstein fleet gain a stuck-count + oldest-stall summary line. The TUI dashboard adds a SupervisorPane highlighting stalled / parked / no-progress sessions with the same recommended action.

Files touched

  • src/bernstein/core/orchestration/supervisor_receipt.py (new) - receipt assembly, signing, verification, deterministic recommended_action, cross-worktree fence.
  • src/bernstein/core/orchestration/supervisor_aggregator.py (new) - reads .sdd/runtime/agents.json, heartbeats, and failure records into a single snapshot.
  • src/bernstein/cli/commands/supervisor_cmd.py (new) - bernstein supervisor status / escalate Click group.
  • src/bernstein/cli/main.py - register the new group.
  • src/bernstein/cli/commands/status_cmd.py - attach supervisor summary to bernstein status.
  • src/bernstein/cli/commands/fleet_cmd.py - append supervisor summary line to the fallback fleet renderer.
  • src/bernstein/core/lifecycle/hooks.py - add worker.escalated lifecycle event for external notifiers.
  • src/bernstein/tui/status_bar.py - add SupervisorPane widget + render_supervisor_pane helper.
  • src/bernstein/tui/worker_badges.py - add WorkerStatus.STUCK + status_for_supervisor_row helper.
  • docs/api/supervisor.md (new) - JSON schema for aggregator snapshot + receipt envelope, plus the determinism contract and cross-worktree fence rules.
  • tests/unit/test_supervisor_receipt.py (new) - 21 tests covering receipt assembly, signature verification, recommended-action determinism (drives the same chain slice from two temp dirs), fence assertion, dict roundtrip.
  • tests/integration/test_supervisor_chain_roundtrip.py (new) - end-to-end: synthetic .sdd/runtime/ -> aggregator -> signed receipt -> JSON roundtrip -> verification.
  • tests/snapshot/test_supervisor_pane_snapshot.py (new) - TUI render snapshot for the new pane.

Acceptance criteria

  • bernstein supervisor status [--json] lists every live worker with last-heartbeat, current task id, stall reason, recommended action, respawn budget remaining.
  • Signed escalation receipt appended on stall detection (and on explicit operator escalation); reuses install Ed25519 key.
  • recommended_action is deterministic over (stall_reason, audit_entries, respawn_budget_remaining).
  • Cross-worktree fence asserted; regression test in place.
  • bernstein supervisor escalate records an escalation event in the audit chain and fires worker.escalated.
  • Stuck-detection summary appears in bernstein status and bernstein fleet output.
  • TUI dashboard adds a pane highlighting parked / stalled / no-progress sessions.
  • Existing stalled_manager, watchdog, spawn_supervisor, and parked-agent surfaces remain the source of truth; the new command aggregates.
  • JSON schema documented under docs/api/supervisor.md.
  • Tests cover receipt assembly, signature verification, deterministic recommended-action derivation, cross-worktree fence assertion, and TUI render.

Test plan

  • uv run pytest tests/unit/test_supervisor_receipt.py tests/integration/test_supervisor_chain_roundtrip.py tests/snapshot/test_supervisor_pane_snapshot.py - 25 passed.
  • uv run pytest tests/unit/ -k "supervisor or stalled or watchdog" - 127 passed, 45 skipped, no regressions.
  • uv run ruff check . - all checks passed.
  • uv run ruff format --check . - all files formatted.
  • uv run pyright on the three new modules - 0 errors.
  • bernstein supervisor --help and bernstein supervisor status --json smoke-tested.

Summary by CodeRabbit

  • New Features

    • Added top-level "supervisor" CLI with status and escalate commands for inspecting and escalating stuck workers.
    • status now includes a supervisor summary in JSON and a dim one-line supervisor summary in human output.
    • TUI/status bar gains a Supervisor pane showing stalled workers and a new "stuck" worker badge/icon.
  • Tests

    • Added unit and integration tests covering supervisor snapshots, escalation receipts, and signing/verification.

Review Change Stack

…receipts

Adds `bernstein supervisor status` and `bernstein supervisor escalate`
as an operator-facing surface over the existing stalled_manager,
watchdog, and spawn_supervisor detectors. The detectors remain the
source of truth; the new command aggregates and renders.

Each escalation appends a signed receipt to the audit chain. The
receipt carries the worker id, worktree id, last N audit entries,
identity tokens, stall reason, recommended action, and the previous
chain digest. The receipt verifies offline against the install's
Ed25519 public key.

The recommended_action field is a pure function of the chain slice
at stall time (stall_reason + audit_entries + respawn_budget_remaining).
Two operators verifying the same receipt arrive at the byte-identical
recommendation. A cross-worktree fence assertion refuses receipt
assembly when the stuck session leaked into a sibling worktree's
resolution events.

The aggregator surfaces a stuck-count + oldest-stall summary line on
`bernstein status` and `bernstein fleet`. The TUI gains a
SupervisorPane that highlights stalled / parked / no-progress sessions
with the same recommended action. Worker badges learn a STUCK status
so the dashboard's existing widgets stay consistent with the pane.

Closes #1800.

Files touched:
  - src/bernstein/core/orchestration/supervisor_receipt.py (new)
  - src/bernstein/core/orchestration/supervisor_aggregator.py (new)
  - src/bernstein/cli/commands/supervisor_cmd.py (new)
  - src/bernstein/cli/main.py
  - src/bernstein/cli/commands/status_cmd.py
  - src/bernstein/cli/commands/fleet_cmd.py
  - src/bernstein/core/lifecycle/hooks.py (worker.escalated event)
  - src/bernstein/tui/status_bar.py (SupervisorPane + helpers)
  - src/bernstein/tui/worker_badges.py (STUCK status)
  - docs/api/supervisor.md (new)
  - tests/unit/test_supervisor_receipt.py (new)
  - tests/integration/test_supervisor_chain_roundtrip.py (new)
  - tests/snapshot/test_supervisor_pane_snapshot.py (new)
@chernistry chernistry enabled auto-merge (squash) May 21, 2026 19:57

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @chernistry, you have reached your weekly rate limit of 2500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Sonar insights (advisory, no merge-block)

Snapshot of bernstein on the configured Sonar instance:

Metric Value
Coverage 13.5
Code smells 125
Bugs 11
Vulnerabilities 2
Security hotspots 87

Run bernstein doctor sonar locally for the full surface.

This comment is a soft signal. The Sonar scan runs on push to main; the PR check itself never fails on smells.

@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6ba61cfa-0e33-457b-b820-365f977a7686

📥 Commits

Reviewing files that changed from the base of the PR and between 7cfdcaa and f1d3a21.

📒 Files selected for processing (4)
  • src/bernstein/cli/commands/fleet_cmd.py
  • src/bernstein/cli/commands/status_cmd.py
  • src/bernstein/cli/commands/supervisor_cmd.py
  • src/bernstein/tui/status_bar.py

📝 Walkthrough

Walkthrough

Introduces a complete supervisor subsystem for detecting and escalating stalled workers. Adds a pure-read aggregator that classifies worker stall state from runtime files, deterministic signed escalation receipts anchored to the audit chain with cross-worktree validation, operator CLI commands for supervisor status and escalate workflows, lifecycle event type, and TUI integration including a supervisor pane and stuck worker status badges. All new code includes defensive error handling to prevent failures during snapshot or status rendering.

Changes

Supervisor Detection, Escalation, and Audit Integration

Layer / File(s) Summary
Supervisor aggregator: snapshot detection and classification
src/bernstein/core/orchestration/supervisor_aggregator.py
Loads agents, heartbeats, parked sessions, and recent failures from the runtime tree. Classifies stall state via heartbeat age, parked markers, and role/status signals. Returns sorted SupervisorSnapshot with stuck-worker count and deterministic recommended_action per worker and provides snapshot_to_dict and format_summary_line.
Escalation receipt signing and cryptographic verification
src/bernstein/core/orchestration/supervisor_receipt.py
Defines EscalationReceipt, canonical serialization, cross-worktree fence enforcement, deterministic recommend_action, assemble_receipt, sign_receipt, and verify_receipt with structured verification results.
Supervisor CLI command group: status and escalate workflows
src/bernstein/cli/commands/supervisor_cmd.py
bernstein supervisor status snapshots and renders stuck workers (Rich table or JSON). bernstein supervisor escalate <session_id> --reason ... validates session, resolves/generates Ed25519 signing key (env-overridable), appends an operator escalation entry to the audit slice, assembles and signs a receipt, persists it under supervisor receipts, appends a non-blocking supervisor.escalated audit entry, and emits a non-blocking worker.escalated lifecycle event.
Main CLI registration and lifecycle event type
src/bernstein/cli/main.py, src/bernstein/core/lifecycle/hooks.py
Registers supervisor command group with the root CLI and adds LifecycleEvent.WORKER_ESCALATED = "worker.escalated" with documented context fields (reason, receipt_path, recommended_action, stall_reason, worker_id).
Supervisor summary integration into status and fleet commands
src/bernstein/cli/commands/status_cmd.py, src/bernstein/cli/commands/fleet_cmd.py
bernstein status JSON output gains a supervisor field and prints a dim one-line supervisor summary in human mode; bernstein fleet fallback render prints the dim supervisor summary line. Both use best-effort helpers that return empty results on errors.
TUI supervisor pane and stuck worker status badge
src/bernstein/tui/status_bar.py, src/bernstein/tui/worker_badges.py
Adds SupervisorPaneRow dataclass and render_supervisor_pane to format stuck workers deterministically (sorted, escaped fields, heartbeat age). Adds SupervisorPane widget with refresh_rows. Introduces WorkerStatus.STUCK with yellow "!" icon and status_for_supervisor_row helper.
Supervisor receipt unit tests
tests/unit/test_supervisor_receipt.py
Unit tests cover recommend_action determinism, cross-worktree fence (accept/reject/ignore), receipt assembly and audit-window trimming, signing/verification roundtrips, tamper detection, canonical-byte stability, and unsigned receipt rejection.
Snapshot tests for TUI pane and end-to-end receipt roundtrip tests
tests/snapshot/test_supervisor_pane_snapshot.py, tests/snapshot/__snapshots__/test_supervisor_pane_snapshot.ambr, tests/integration/test_supervisor_chain_roundtrip.py
Snapshot tests assert deterministic TUI pane rendering (empty when healthy, sorted stuck rows). Integration tests seed runtime tree artifacts, compute aggregator snapshots, assemble and sign receipts, verify signatures, and assert receipt recommended_action matches aggregator recommendation.
README API coverage allowlist update
tests/unit/test_readme_api_coverage.py
Adds "supervisor" to DOCUMENTED_COMMANDS so README API coverage marks the new command documented.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • sipyourdrink-ltd/bernstein#1683: Introduces parked-session and respawn-budget concepts that the supervisor aggregator and receipt logic consume to classify stall state and compute recommended actions.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main addition: a new supervisor command surface with signed escalation receipts, which aligns with the primary changes across multiple files.
Description check ✅ Passed The description is comprehensive, covering objectives, files touched, acceptance criteria, and test plan. It follows the template structure with clear sections explaining What, Why, and implementation approach.
Linked Issues check ✅ Passed The PR fully implements the objectives from issue #1800: operator CLI commands, signed escalation receipts, deterministic recommended_action, cross-worktree fence, audit chain integration, TUI surfaces, and comprehensive tests.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #1800. Core modules (supervisor_receipt, supervisor_aggregator), CLI registration, TUI surfaces, and lifecycle hooks are all necessary for the supervisor surface feature.
Docstring Coverage ✅ Passed Docstring coverage is 90.70% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/1800-supervisor-escalation-receipts

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Review-bot acknowledgement summary

  • Must-address findings: 7 (7 acknowledged, 0 open)
  • Informational findings: 8

All must-address findings are resolved or acknowledged.

@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

bernstein doctor observe for PR #1805 (feat/1800-supervisor-escalation-receipts): ok=0, warn=2, fail=0, error=0, skipped=2

sonar -- WARN (project bernstein)

metric value delta threshold status
coverage_pct 13.5% new 80.0% fail
code_smells 125 new 50 warn
bugs 11 new 0 fail
vulnerabilities 2 new 0 warn
security_hotspots 87 new 0 fail

code-scanning -- WARN (17 open alert(s))

metric value delta threshold status
open_alerts 17 new 0 fail
critical_alerts 0 new 0 ok
high_alerts 2 new 0 warn
medium_alerts 0 new - ok
low_alerts 0 new - ok
Skipped backends (credentials not configured)
  • glitchtip: BERNSTEIN_GLITCHTIP_TOKEN not set
  • dt: DTRACK_URL/TOKEN/PROJECT not set

See docs/observability/unified-doctor.md for backend setup notes.

Auto-applied by contract-drift-autofix.yml on PR #1805.
Regenerated via scripts/regen_contract_drift.py. Refs #1273.

Source CI run: https://github.com/sipyourdrink-ltd/bernstein/actions/runs/26249756348

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/bernstein/cli/commands/fleet_cmd.py`:
- Around line 163-164: The bare except in the fleet renderer that currently does
"except Exception: return """ hides real errors; update the except block in the
fleet supervisor-summary code to either catch specific expected exceptions or at
minimum log the exception details before returning, e.g. use the module's logger
(or process_logger) and call logger.exception or logger.error(...,
exc_info=True) with a clear message about "fleet supervisor-summary" so failures
are operator-visible while preserving the existing fallback return value.

In `@src/bernstein/cli/commands/status_cmd.py`:
- Around line 227-228: The broad except blocks in the supervisor-summary helpers
in status_cmd.py currently swallow all exceptions and return {} — update those
handlers to either catch specific expected exceptions (e.g., ValueError,
KeyError, RuntimeError) or, if you must keep a generic Exception fallback, log
the error before returning the empty dict; use the module logger (or
processLogger) and logger.exception(...) or logger.error(..., exc_info=True) in
the same except blocks so failures remain diagnosable while preserving the
fallback behavior.

In `@src/bernstein/cli/commands/supervisor_cmd.py`:
- Around line 169-175: The command accepts whitespace-only reasons and proceeds
to mutate state; update supervisor_escalate to validate reason by trimming
whitespace (reason.strip()) at the very start and abort before any
key/receipt/audit side effects if empty; raise a click.BadParameter or
click.ClickException (so the CLI exits cleanly with an error) and mirror the
same validation for the related handler code around lines 182-193 that also
consumes the reason string to ensure no state mutation happens with an
empty-trimmed reason.
- Around line 342-345: The try/except around get_install_rev() silently swallows
all Exceptions and returns "" which hides failures; replace the bare except with
handling that logs the error and preserves helpful behavior—catch the specific
expected exception types if known (or use "except Exception as e:"), call the
module logger (e.g., logger.exception or logger.error with the exception) to
record the failure and context, then either re-raise for unexpected cases or
return "" only after logging; update the block surrounding get_install_rev() in
supervisor_cmd.py to log the exception (including the exception object) instead
of silently returning.
- Around line 54-58: Move the four constants SUPERVISOR_RECEIPT_DIR,
SUPERVISOR_AUDIT_DIR, INSTALL_SIGNING_KEY_ENV, and DEFAULT_INSTALL_SIGNING_KEY
out of the supervisor_cmd module into the shared defaults module
(core.defaults): create or add them in core.defaults with the same names and
values, export them there, then remove the inline definitions from
supervisor_cmd and import the constants with from core.defaults import
SUPERVISOR_RECEIPT_DIR, SUPERVISOR_AUDIT_DIR, INSTALL_SIGNING_KEY_ENV,
DEFAULT_INSTALL_SIGNING_KEY so supervisor_cmd uses the shared defaults.
- Around line 380-385: The current filename generation using fname =
f"{int(time.time())}-{receipt.session_id}-{receipt.payload_digest[:12]}.json"
can collide; change it to include a unique high-resolution or random component
(e.g., time.time_ns() or uuid.uuid4().hex) when building fname using
receipt.session_id and payload_digest to keep traceability, and write atomically
by writing to a temp file in dest_dir and then renaming to path to avoid
clobbering; update references to fname/path and keep using
receipt_to_dict(receipt) for the file contents.
- Around line 368-373: Currently the except block swallows all exceptions and
returns "0"*64 which can reset the chain anchor; instead, detect whether the
audit store actually exists and only return the zero-anchor when there is no
audit data (e.g., audit_dir missing/empty); if the audit exists but
AuditLog(audit_dir=audit_dir) raises, propagate or re-raise the exception (or
log and raise) so callers don't silently break chain continuity. Update the
block around AuditLog(audit_dir=audit_dir) to check existence of the audit
storage first, return getattr(log, "_prev_hmac", "0"*64) when present, and avoid
catching Exception broadly—only handle the non-existence case locally and let
other errors surface.

In `@src/bernstein/core/orchestration/supervisor_aggregator.py`:
- Around line 65-67: Move the inline constant AGGREGATOR_SCHEMA_VERSION out of
supervisor_aggregator.py and define it in the core/defaults.py module, then
import it into supervisor_aggregator.py; specifically, add
AGGREGATOR_SCHEMA_VERSION = "1.0.0" to core/defaults.py (alongside other
defaults) and replace the inline declaration in supervisor_aggregator.py with an
import from core.defaults (use the same symbol name AGGREGATOR_SCHEMA_VERSION
where referenced).
- Around line 120-140: load_agents_snapshot and the other exposed routines
return and propagate untyped dict[str, Any] objects; replace those raw dicts
with a frozen TypedDict (e.g., AgentRow: TypedDict with required fields) or a
frozen dataclass model and update the function signature to return
list[AgentRow] (or tuple[...] if immutable), cast only when validating input,
and construct/validate AgentRow instances before returning; update all
referenced functions/sections (the blocks at 143-161, 201-233, 415-442) to parse
JSON into the new TypedDict/dataclass types, add explicit type hints on all
public functions, and remove broad cast usages so the codebase passes strict
Pyright type checks.

In `@src/bernstein/core/orchestration/supervisor_receipt.py`:
- Around line 84-92: RECEIPT_SCHEMA_VERSION and DEFAULT_RECEIPT_AUDIT_WINDOW
(and the other constants introduced at lines ~315-323) should be moved out of
supervisor_receipt.py into the central constants module bernstein.core.defaults;
create or add them in defaults.py with the same names and values, remove their
definitions from supervisor_receipt.py, and update supervisor_receipt.py to
import those names from bernstein.core.defaults (e.g., from
bernstein.core.defaults import RECEIPT_SCHEMA_VERSION,
DEFAULT_RECEIPT_AUDIT_WINDOW) so the module uses the centralized defaults
instead of local constants.
- Around line 213-220: Replace the raw dict-based contract surface by
introducing explicit TypedDict models for receipt-related shapes (e.g.,
AuditEntryTypedDict, ReceiptDetailsTypedDict, SupervisorReceiptTypedDict) and
use them instead of dict[str, Any] on the SupervisorReceipt fields
(audit_entries, details) and on all public functions that currently
accept/return ad-hoc dicts; update the dataclass fields (audit_entries:
tuple[AuditEntryTypedDict, ...], details: ReceiptDetailsTypedDict) and change
function signatures that accept payload/receipt dicts to use these TypedDict
types, ensure exported types are frozen/readonly where appropriate, add the new
imports/aliases, and run/typecheck until Pyright strict mode passes so all
public functions referenced in this file use the new TypedDict types instead of
raw dicts.

In `@src/bernstein/tui/status_bar.py`:
- Around line 500-505: The Rich markup string is built by interpolating
unescaped dynamic values (notably r.worker_id and r.role) which may contain '['
or ']' and will be parsed by Text.from_markup at Text.from_markup; wrap these
values with rich.markup.escape (import escape from rich.markup) before
interpolating into the f-string used in the status line (the block that builds
the string containing worker_id, role, reason, recommend, hb, budget). Escape
each user-controlled field (at least r.worker_id and r.role; consider
stall_reason and recommended_action) so Rich treats them as literal text rather
than markup.

In `@tests/integration/test_supervisor_chain_roundtrip.py`:
- Around line 42-88: Declare TypedDicts (e.g., RuntimeAgentDoc,
RuntimeHeartbeat, RuntimeFailure) and use them inside _seed_runtime_tree to type
the JSON payloads instead of raw dicts: change the types of agents_doc,
heartbeat, and failure to the new TypedDict types and construct their values to
conform to those schemas (including typing for stalled_manager and task_ids);
ensure imports (from typing import TypedDict, Any) are added and that json.dumps
still receives dict-compatible structures so file writes remain unchanged.

In `@tests/unit/test_supervisor_receipt.py`:
- Around line 40-48: Replace the raw dict return type of _audit_slice with a
TypedDict for type safety: define a TypedDict (e.g., AuditEntryFixture) that
includes event_type: str, session_id: str and optional details: dict[str, Any],
import TypedDict and Any, then change the _audit_slice signature to return
list[AuditEntryFixture] and construct the same dict literals (they will be
type-checked as AuditEntryFixture) so tests keep the same behavior while
following the "never raw dicts" guideline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 51890e01-741d-4d8e-a3c1-919293721f75

📥 Commits

Reviewing files that changed from the base of the PR and between fe6a831 and 7cfdcaa.

⛔ Files ignored due to path filters (1)
  • docs/api/supervisor.md is excluded by !docs/api/**
📒 Files selected for processing (14)
  • src/bernstein/cli/commands/fleet_cmd.py
  • src/bernstein/cli/commands/status_cmd.py
  • src/bernstein/cli/commands/supervisor_cmd.py
  • src/bernstein/cli/main.py
  • src/bernstein/core/lifecycle/hooks.py
  • src/bernstein/core/orchestration/supervisor_aggregator.py
  • src/bernstein/core/orchestration/supervisor_receipt.py
  • src/bernstein/tui/status_bar.py
  • src/bernstein/tui/worker_badges.py
  • tests/integration/test_supervisor_chain_roundtrip.py
  • tests/snapshot/__snapshots__/test_supervisor_pane_snapshot.ambr
  • tests/snapshot/test_supervisor_pane_snapshot.py
  • tests/unit/test_readme_api_coverage.py
  • tests/unit/test_supervisor_receipt.py

Comment thread src/bernstein/cli/commands/fleet_cmd.py
Comment thread src/bernstein/cli/commands/status_cmd.py
Comment on lines +54 to +58
SUPERVISOR_RECEIPT_DIR = ".sdd/runtime/supervisor/receipts"
SUPERVISOR_AUDIT_DIR = ".sdd/audit"
INSTALL_SIGNING_KEY_ENV = "BERNSTEIN_SUPERVISOR_SIGNING_KEY"
DEFAULT_INSTALL_SIGNING_KEY = ".sdd/runtime/supervisor/install.key"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Move supervisor constants to core/defaults.py.

Line 54 through Line 58 introduces new constants in the command module instead of shared defaults.

As per coding guidelines: "New constants must go in core/defaults.py, not inline in other modules".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/bernstein/cli/commands/supervisor_cmd.py` around lines 54 - 58, Move the
four constants SUPERVISOR_RECEIPT_DIR, SUPERVISOR_AUDIT_DIR,
INSTALL_SIGNING_KEY_ENV, and DEFAULT_INSTALL_SIGNING_KEY out of the
supervisor_cmd module into the shared defaults module (core.defaults): create or
add them in core.defaults with the same names and values, export them there,
then remove the inline definitions from supervisor_cmd and import the constants
with from core.defaults import SUPERVISOR_RECEIPT_DIR, SUPERVISOR_AUDIT_DIR,
INSTALL_SIGNING_KEY_ENV, DEFAULT_INSTALL_SIGNING_KEY so supervisor_cmd uses the
shared defaults.

Comment thread src/bernstein/cli/commands/supervisor_cmd.py
Comment on lines +210 to +226
audit_entries: list[dict[str, Any]] = [
{
"event_type": str(rec.get("kind", "")),
"session_id": session_id,
"details": rec,
}
for rec in failures
]
# Append the operator escalation as the trailing entry so the receipt
# captures the operator's intent even when no prior diagnostic exists.
audit_entries.append(
{
"event_type": "supervisor.escalate",
"session_id": session_id,
"details": {"reason": reason},
}
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Use TypedDict for CLI payload structures.

Line 210 and Line 268 build raw dict payloads that are exchanged across module and persistence boundaries. Define explicit TypedDict payload schemas to keep strict typing enforceable.

As per coding guidelines: "Use frozen dataclasses or TypedDict for all data structures in Python - never raw dicts" and "Type hints must be present on ALL public functions. Pyright strict mode must pass".

Also applies to: 268-275

Comment on lines +84 to +92
#: Schema version embedded in every receipt. Bumped only on breaking changes.
RECEIPT_SCHEMA_VERSION: str = "1.0.0"


#: Default number of trailing audit entries captured in the receipt. The
#: window is fixed so two receipts assembled from the same chain prefix
#: are byte-identical regardless of how much audit history exists.
DEFAULT_RECEIPT_AUDIT_WINDOW: int = 16

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Move new constants to core/defaults.py.

Line 85, Line 91, and Line 315 introduce new constants directly in this module. Centralizing these in bernstein.core.defaults keeps configuration ownership consistent and avoids drift.

As per coding guidelines: "New constants must go in core/defaults.py, not inline in other modules".

Also applies to: 315-323

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/bernstein/core/orchestration/supervisor_receipt.py` around lines 84 - 92,
RECEIPT_SCHEMA_VERSION and DEFAULT_RECEIPT_AUDIT_WINDOW (and the other constants
introduced at lines ~315-323) should be moved out of supervisor_receipt.py into
the central constants module bernstein.core.defaults; create or add them in
defaults.py with the same names and values, remove their definitions from
supervisor_receipt.py, and update supervisor_receipt.py to import those names
from bernstein.core.defaults (e.g., from bernstein.core.defaults import
RECEIPT_SCHEMA_VERSION, DEFAULT_RECEIPT_AUDIT_WINDOW) so the module uses the
centralized defaults instead of local constants.

Comment on lines +213 to +220
audit_entries: tuple[dict[str, Any], ...]
identity: IdentityTokens
prev_chain_digest: str
payload_digest: str
respawn_budget_remaining: int = 0
signature_b64: str = ""
details: dict[str, Any] = field(default_factory=dict[str, Any])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Replace raw dict-based receipt contracts with TypedDict models.

Lines 213-220 and the public function signatures at Line 268, Line 340, Line 467, and Line 555 expose ad-hoc dict[str, Any] payloads. This weakens strict typing on a cross-module contract surface.

As per coding guidelines: "Use frozen dataclasses or TypedDict for all data structures in Python - never raw dicts" and "Type hints must be present on ALL public functions. Pyright strict mode must pass".

Also applies to: 268-272, 340-343, 467-468, 555-567

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/bernstein/core/orchestration/supervisor_receipt.py` around lines 213 -
220, Replace the raw dict-based contract surface by introducing explicit
TypedDict models for receipt-related shapes (e.g., AuditEntryTypedDict,
ReceiptDetailsTypedDict, SupervisorReceiptTypedDict) and use them instead of
dict[str, Any] on the SupervisorReceipt fields (audit_entries, details) and on
all public functions that currently accept/return ad-hoc dicts; update the
dataclass fields (audit_entries: tuple[AuditEntryTypedDict, ...], details:
ReceiptDetailsTypedDict) and change function signatures that accept
payload/receipt dicts to use these TypedDict types, ensure exported types are
frozen/readonly where appropriate, add the new imports/aliases, and
run/typecheck until Pyright strict mode passes so all public functions
referenced in this file use the new TypedDict types instead of raw dicts.

Comment thread src/bernstein/tui/status_bar.py Outdated
Comment on lines +42 to +88
def _seed_runtime_tree(workdir: Path, *, session_id: str, role: str = "manager") -> None:
"""Populate a minimal ``.sdd/runtime/`` tree that mimics a stalled session."""
runtime = workdir / ".sdd" / "runtime"
(runtime / "heartbeats").mkdir(parents=True, exist_ok=True)
(runtime / "failures").mkdir(parents=True, exist_ok=True)
(runtime / "spawn_supervisor").mkdir(parents=True, exist_ok=True)

# agents.json: one live worker, role=manager, with a stalled diagnostic.
agents_doc = {
"agents": [
{
"id": session_id,
"role": role,
"status": "working",
"task_ids": ["t-1"],
"worker_id": "abc123def456",
"worktree_id": "wt-A",
"stalled_manager": {
"kind": "stalled_manager",
"session_id": session_id,
"runtime_s": 120.0,
"hook_event_count": 12,
"detected_at": 1700000000.0,
},
}
]
}
(runtime / "agents.json").write_text(json.dumps(agents_doc, sort_keys=True))

# Heartbeat - aged so the aggregator flags the session as stuck.
heartbeat = {
"timestamp": 1700000000.0 - 600.0, # 10 min old
"phase": "implementing",
"progress_pct": 0,
}
(runtime / "heartbeats" / f"{session_id}.json").write_text(json.dumps(heartbeat, sort_keys=True))

# One failure record so the aggregator has something to feed into the
# receipt's audit_entries slice.
failure = {
"kind": "stalled_manager",
"session_id": session_id,
"runtime_s": 120.0,
"hook_event_count": 12,
"detected_at": 1700000000.0,
}
(runtime / "failures" / f"manager-stalled-{session_id}.json").write_text(json.dumps(failure, sort_keys=True))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Consider TypedDict for runtime tree JSON documents.

The _seed_runtime_tree helper constructs several raw dict structures (agents_doc, heartbeat, failure) that simulate the JSON documents written by upstream detectors. Defining TypedDict types for these structures would improve type safety and document the expected schema:

class RuntimeAgentDoc(TypedDict):
    id: str
    role: str
    status: str
    task_ids: list[str]
    worker_id: str
    worktree_id: str
    stalled_manager: dict[str, Any]  # or further typed

class RuntimeHeartbeat(TypedDict):
    timestamp: float
    phase: str
    progress_pct: int

As per coding guidelines: "Use frozen dataclasses or TypedDict for all data structures in Python - never raw dicts".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/test_supervisor_chain_roundtrip.py` around lines 42 - 88,
Declare TypedDicts (e.g., RuntimeAgentDoc, RuntimeHeartbeat, RuntimeFailure) and
use them inside _seed_runtime_tree to type the JSON payloads instead of raw
dicts: change the types of agents_doc, heartbeat, and failure to the new
TypedDict types and construct their values to conform to those schemas
(including typing for stalled_manager and task_ids); ensure imports (from typing
import TypedDict, Any) are added and that json.dumps still receives
dict-compatible structures so file writes remain unchanged.

Comment on lines +40 to +48
def _audit_slice(*, with_fatal: bool = False) -> list[dict[str, str]]:
base = [
{"event_type": "session.start", "session_id": "sess-1"},
{"event_type": "task.pre_spawn", "session_id": "sess-1"},
{"event_type": "heartbeat.tick", "session_id": "sess-1"},
]
if with_fatal:
base.append({"event_type": "auth.denied", "session_id": "sess-1"})
return base

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Consider TypedDict for audit entry test fixtures.

The guideline states "never raw dicts" even in test code. While these fixtures simulate JSON input from the runtime system, annotating _audit_slice() return type with a TypedDict would improve type safety without changing the dict literal syntax:

from typing import TypedDict

class AuditEntryFixture(TypedDict, total=False):
    event_type: str
    session_id: str
    details: dict[str, Any]

def _audit_slice(*, with_fatal: bool = False) -> list[AuditEntryFixture]:
    ...

As per coding guidelines: "Use frozen dataclasses or TypedDict for all data structures in Python - never raw dicts".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_supervisor_receipt.py` around lines 40 - 48, Replace the raw
dict return type of _audit_slice with a TypedDict for type safety: define a
TypedDict (e.g., AuditEntryFixture) that includes event_type: str, session_id:
str and optional details: dict[str, Any], import TypedDict and Any, then change
the _audit_slice signature to return list[AuditEntryFixture] and construct the
same dict literals (they will be type-checked as AuditEntryFixture) so tests
keep the same behavior while following the "never raw dicts" guideline.

- Validate `--reason` strips to non-empty before any state mutation in
  `bernstein supervisor escalate`. A whitespace-only reason previously
  passed through into the receipt body.
- Log silent-broad-exception paths in the supervisor summary helpers
  (status + fleet) and in the install-fingerprint lookup so an empty
  summary surfaces a cause in the orchestrator log instead of looking
  identical to a healthy run.
- Refuse to silently reset the audit chain anchor when the audit log
  directory exists but is unreadable. The previous fallback to the
  genesis sentinel would have let a fresh receipt skip the chain head
  and break the tamper-evidence guarantee.
- Switch receipt filenames to nanosecond timestamps and open the file
  exclusively (``"x"``) so two escalations in the same second cannot
  silently overwrite each other.
- Escape every dynamic field interpolated into the supervisor TUI
  pane's Rich markup; an upstream caller passing ``[``/``]`` in a
  worker id or role no longer corrupts the dashboard layout.

bot-ack: 3284009273
bot-ack: 3284009276
bot-ack: 3284009287
bot-ack: 3284009299
bot-ack: 3284009302
bot-ack: 3284009305
bot-ack: 3284009328
@chernistry chernistry merged commit da8e804 into main May 21, 2026
34 of 36 checks passed
@chernistry chernistry deleted the feat/1800-supervisor-escalation-receipts branch May 21, 2026 20:21
hb = f"{int(row.last_heartbeat_age_s)}s"
stuck_label = "STUCK" if row.is_stuck else "ok"
table.add_row(
f"{row.worker_id}",
if not isinstance(payload_any, dict):
return []
payload = cast(dict[str, Any], payload_any)
raw = payload.get("agents")

# Audit slice for the deterministic recommended action: failures
# already classified for this session.
slice_entries: list[dict[str, Any]] = []
failures = 0
for entry in audit_entries:
event_type = str(entry.get("event_type", "")).lower()
if event_type.endswith(".failed") or event_type.endswith(".error") or event_type.endswith(".errored"):
failures = 0
for entry in audit_entries:
event_type = str(entry.get("event_type", "")).lower()
if event_type.endswith(".failed") or event_type.endswith(".error") or event_type.endswith(".errored"):
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Operator supervisor surface with signed escalation receipts

2 participants