feat(supervisor): operator supervisor surface with signed escalation receipts#1805
Conversation
…receipts Adds `bernstein supervisor status` and `bernstein supervisor escalate` as an operator-facing surface over the existing stalled_manager, watchdog, and spawn_supervisor detectors. The detectors remain the source of truth; the new command aggregates and renders. Each escalation appends a signed receipt to the audit chain. The receipt carries the worker id, worktree id, last N audit entries, identity tokens, stall reason, recommended action, and the previous chain digest. The receipt verifies offline against the install's Ed25519 public key. The recommended_action field is a pure function of the chain slice at stall time (stall_reason + audit_entries + respawn_budget_remaining). Two operators verifying the same receipt arrive at the byte-identical recommendation. A cross-worktree fence assertion refuses receipt assembly when the stuck session leaked into a sibling worktree's resolution events. The aggregator surfaces a stuck-count + oldest-stall summary line on `bernstein status` and `bernstein fleet`. The TUI gains a SupervisorPane that highlights stalled / parked / no-progress sessions with the same recommended action. Worker badges learn a STUCK status so the dashboard's existing widgets stay consistent with the pane. Closes #1800. Files touched: - src/bernstein/core/orchestration/supervisor_receipt.py (new) - src/bernstein/core/orchestration/supervisor_aggregator.py (new) - src/bernstein/cli/commands/supervisor_cmd.py (new) - src/bernstein/cli/main.py - src/bernstein/cli/commands/status_cmd.py - src/bernstein/cli/commands/fleet_cmd.py - src/bernstein/core/lifecycle/hooks.py (worker.escalated event) - src/bernstein/tui/status_bar.py (SupervisorPane + helpers) - src/bernstein/tui/worker_badges.py (STUCK status) - docs/api/supervisor.md (new) - tests/unit/test_supervisor_receipt.py (new) - tests/integration/test_supervisor_chain_roundtrip.py (new) - tests/snapshot/test_supervisor_pane_snapshot.py (new)
There was a problem hiding this comment.
Sorry @chernistry, you have reached your weekly rate limit of 2500000 diff characters.
Please try again later or upgrade to continue using Sourcery
Sonar insights (advisory, no merge-block)Snapshot of
Run This comment is a soft signal. The Sonar scan runs on push to |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughIntroduces a complete supervisor subsystem for detecting and escalating stalled workers. Adds a pure-read aggregator that classifies worker stall state from runtime files, deterministic signed escalation receipts anchored to the audit chain with cross-worktree validation, operator CLI commands for supervisor ChangesSupervisor Detection, Escalation, and Audit Integration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Review-bot acknowledgement summary
All must-address findings are resolved or acknowledged. |
|
bernstein doctor observe for PR #1805 ( sonar -- WARN (project bernstein)
code-scanning -- WARN (17 open alert(s))
Skipped backends (credentials not configured)
See docs/observability/unified-doctor.md for backend setup notes. |
Auto-applied by contract-drift-autofix.yml on PR #1805. Regenerated via scripts/regen_contract_drift.py. Refs #1273. Source CI run: https://github.com/sipyourdrink-ltd/bernstein/actions/runs/26249756348
There was a problem hiding this comment.
Actionable comments posted: 15
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/bernstein/cli/commands/fleet_cmd.py`:
- Around line 163-164: The bare except in the fleet renderer that currently does
"except Exception: return """ hides real errors; update the except block in the
fleet supervisor-summary code to either catch specific expected exceptions or at
minimum log the exception details before returning, e.g. use the module's logger
(or process_logger) and call logger.exception or logger.error(...,
exc_info=True) with a clear message about "fleet supervisor-summary" so failures
are operator-visible while preserving the existing fallback return value.
In `@src/bernstein/cli/commands/status_cmd.py`:
- Around line 227-228: The broad except blocks in the supervisor-summary helpers
in status_cmd.py currently swallow all exceptions and return {} — update those
handlers to either catch specific expected exceptions (e.g., ValueError,
KeyError, RuntimeError) or, if you must keep a generic Exception fallback, log
the error before returning the empty dict; use the module logger (or
processLogger) and logger.exception(...) or logger.error(..., exc_info=True) in
the same except blocks so failures remain diagnosable while preserving the
fallback behavior.
In `@src/bernstein/cli/commands/supervisor_cmd.py`:
- Around line 169-175: The command accepts whitespace-only reasons and proceeds
to mutate state; update supervisor_escalate to validate reason by trimming
whitespace (reason.strip()) at the very start and abort before any
key/receipt/audit side effects if empty; raise a click.BadParameter or
click.ClickException (so the CLI exits cleanly with an error) and mirror the
same validation for the related handler code around lines 182-193 that also
consumes the reason string to ensure no state mutation happens with an
empty-trimmed reason.
- Around line 342-345: The try/except around get_install_rev() silently swallows
all Exceptions and returns "" which hides failures; replace the bare except with
handling that logs the error and preserves helpful behavior—catch the specific
expected exception types if known (or use "except Exception as e:"), call the
module logger (e.g., logger.exception or logger.error with the exception) to
record the failure and context, then either re-raise for unexpected cases or
return "" only after logging; update the block surrounding get_install_rev() in
supervisor_cmd.py to log the exception (including the exception object) instead
of silently returning.
- Around line 54-58: Move the four constants SUPERVISOR_RECEIPT_DIR,
SUPERVISOR_AUDIT_DIR, INSTALL_SIGNING_KEY_ENV, and DEFAULT_INSTALL_SIGNING_KEY
out of the supervisor_cmd module into the shared defaults module
(core.defaults): create or add them in core.defaults with the same names and
values, export them there, then remove the inline definitions from
supervisor_cmd and import the constants with from core.defaults import
SUPERVISOR_RECEIPT_DIR, SUPERVISOR_AUDIT_DIR, INSTALL_SIGNING_KEY_ENV,
DEFAULT_INSTALL_SIGNING_KEY so supervisor_cmd uses the shared defaults.
- Around line 380-385: The current filename generation using fname =
f"{int(time.time())}-{receipt.session_id}-{receipt.payload_digest[:12]}.json"
can collide; change it to include a unique high-resolution or random component
(e.g., time.time_ns() or uuid.uuid4().hex) when building fname using
receipt.session_id and payload_digest to keep traceability, and write atomically
by writing to a temp file in dest_dir and then renaming to path to avoid
clobbering; update references to fname/path and keep using
receipt_to_dict(receipt) for the file contents.
- Around line 368-373: Currently the except block swallows all exceptions and
returns "0"*64 which can reset the chain anchor; instead, detect whether the
audit store actually exists and only return the zero-anchor when there is no
audit data (e.g., audit_dir missing/empty); if the audit exists but
AuditLog(audit_dir=audit_dir) raises, propagate or re-raise the exception (or
log and raise) so callers don't silently break chain continuity. Update the
block around AuditLog(audit_dir=audit_dir) to check existence of the audit
storage first, return getattr(log, "_prev_hmac", "0"*64) when present, and avoid
catching Exception broadly—only handle the non-existence case locally and let
other errors surface.
In `@src/bernstein/core/orchestration/supervisor_aggregator.py`:
- Around line 65-67: Move the inline constant AGGREGATOR_SCHEMA_VERSION out of
supervisor_aggregator.py and define it in the core/defaults.py module, then
import it into supervisor_aggregator.py; specifically, add
AGGREGATOR_SCHEMA_VERSION = "1.0.0" to core/defaults.py (alongside other
defaults) and replace the inline declaration in supervisor_aggregator.py with an
import from core.defaults (use the same symbol name AGGREGATOR_SCHEMA_VERSION
where referenced).
- Around line 120-140: load_agents_snapshot and the other exposed routines
return and propagate untyped dict[str, Any] objects; replace those raw dicts
with a frozen TypedDict (e.g., AgentRow: TypedDict with required fields) or a
frozen dataclass model and update the function signature to return
list[AgentRow] (or tuple[...] if immutable), cast only when validating input,
and construct/validate AgentRow instances before returning; update all
referenced functions/sections (the blocks at 143-161, 201-233, 415-442) to parse
JSON into the new TypedDict/dataclass types, add explicit type hints on all
public functions, and remove broad cast usages so the codebase passes strict
Pyright type checks.
In `@src/bernstein/core/orchestration/supervisor_receipt.py`:
- Around line 84-92: RECEIPT_SCHEMA_VERSION and DEFAULT_RECEIPT_AUDIT_WINDOW
(and the other constants introduced at lines ~315-323) should be moved out of
supervisor_receipt.py into the central constants module bernstein.core.defaults;
create or add them in defaults.py with the same names and values, remove their
definitions from supervisor_receipt.py, and update supervisor_receipt.py to
import those names from bernstein.core.defaults (e.g., from
bernstein.core.defaults import RECEIPT_SCHEMA_VERSION,
DEFAULT_RECEIPT_AUDIT_WINDOW) so the module uses the centralized defaults
instead of local constants.
- Around line 213-220: Replace the raw dict-based contract surface by
introducing explicit TypedDict models for receipt-related shapes (e.g.,
AuditEntryTypedDict, ReceiptDetailsTypedDict, SupervisorReceiptTypedDict) and
use them instead of dict[str, Any] on the SupervisorReceipt fields
(audit_entries, details) and on all public functions that currently
accept/return ad-hoc dicts; update the dataclass fields (audit_entries:
tuple[AuditEntryTypedDict, ...], details: ReceiptDetailsTypedDict) and change
function signatures that accept payload/receipt dicts to use these TypedDict
types, ensure exported types are frozen/readonly where appropriate, add the new
imports/aliases, and run/typecheck until Pyright strict mode passes so all
public functions referenced in this file use the new TypedDict types instead of
raw dicts.
In `@src/bernstein/tui/status_bar.py`:
- Around line 500-505: The Rich markup string is built by interpolating
unescaped dynamic values (notably r.worker_id and r.role) which may contain '['
or ']' and will be parsed by Text.from_markup at Text.from_markup; wrap these
values with rich.markup.escape (import escape from rich.markup) before
interpolating into the f-string used in the status line (the block that builds
the string containing worker_id, role, reason, recommend, hb, budget). Escape
each user-controlled field (at least r.worker_id and r.role; consider
stall_reason and recommended_action) so Rich treats them as literal text rather
than markup.
In `@tests/integration/test_supervisor_chain_roundtrip.py`:
- Around line 42-88: Declare TypedDicts (e.g., RuntimeAgentDoc,
RuntimeHeartbeat, RuntimeFailure) and use them inside _seed_runtime_tree to type
the JSON payloads instead of raw dicts: change the types of agents_doc,
heartbeat, and failure to the new TypedDict types and construct their values to
conform to those schemas (including typing for stalled_manager and task_ids);
ensure imports (from typing import TypedDict, Any) are added and that json.dumps
still receives dict-compatible structures so file writes remain unchanged.
In `@tests/unit/test_supervisor_receipt.py`:
- Around line 40-48: Replace the raw dict return type of _audit_slice with a
TypedDict for type safety: define a TypedDict (e.g., AuditEntryFixture) that
includes event_type: str, session_id: str and optional details: dict[str, Any],
import TypedDict and Any, then change the _audit_slice signature to return
list[AuditEntryFixture] and construct the same dict literals (they will be
type-checked as AuditEntryFixture) so tests keep the same behavior while
following the "never raw dicts" guideline.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 51890e01-741d-4d8e-a3c1-919293721f75
⛔ Files ignored due to path filters (1)
docs/api/supervisor.mdis excluded by!docs/api/**
📒 Files selected for processing (14)
src/bernstein/cli/commands/fleet_cmd.pysrc/bernstein/cli/commands/status_cmd.pysrc/bernstein/cli/commands/supervisor_cmd.pysrc/bernstein/cli/main.pysrc/bernstein/core/lifecycle/hooks.pysrc/bernstein/core/orchestration/supervisor_aggregator.pysrc/bernstein/core/orchestration/supervisor_receipt.pysrc/bernstein/tui/status_bar.pysrc/bernstein/tui/worker_badges.pytests/integration/test_supervisor_chain_roundtrip.pytests/snapshot/__snapshots__/test_supervisor_pane_snapshot.ambrtests/snapshot/test_supervisor_pane_snapshot.pytests/unit/test_readme_api_coverage.pytests/unit/test_supervisor_receipt.py
| SUPERVISOR_RECEIPT_DIR = ".sdd/runtime/supervisor/receipts" | ||
| SUPERVISOR_AUDIT_DIR = ".sdd/audit" | ||
| INSTALL_SIGNING_KEY_ENV = "BERNSTEIN_SUPERVISOR_SIGNING_KEY" | ||
| DEFAULT_INSTALL_SIGNING_KEY = ".sdd/runtime/supervisor/install.key" | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Move supervisor constants to core/defaults.py.
Line 54 through Line 58 introduces new constants in the command module instead of shared defaults.
As per coding guidelines: "New constants must go in core/defaults.py, not inline in other modules".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/bernstein/cli/commands/supervisor_cmd.py` around lines 54 - 58, Move the
four constants SUPERVISOR_RECEIPT_DIR, SUPERVISOR_AUDIT_DIR,
INSTALL_SIGNING_KEY_ENV, and DEFAULT_INSTALL_SIGNING_KEY out of the
supervisor_cmd module into the shared defaults module (core.defaults): create or
add them in core.defaults with the same names and values, export them there,
then remove the inline definitions from supervisor_cmd and import the constants
with from core.defaults import SUPERVISOR_RECEIPT_DIR, SUPERVISOR_AUDIT_DIR,
INSTALL_SIGNING_KEY_ENV, DEFAULT_INSTALL_SIGNING_KEY so supervisor_cmd uses the
shared defaults.
| audit_entries: list[dict[str, Any]] = [ | ||
| { | ||
| "event_type": str(rec.get("kind", "")), | ||
| "session_id": session_id, | ||
| "details": rec, | ||
| } | ||
| for rec in failures | ||
| ] | ||
| # Append the operator escalation as the trailing entry so the receipt | ||
| # captures the operator's intent even when no prior diagnostic exists. | ||
| audit_entries.append( | ||
| { | ||
| "event_type": "supervisor.escalate", | ||
| "session_id": session_id, | ||
| "details": {"reason": reason}, | ||
| } | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
Use TypedDict for CLI payload structures.
Line 210 and Line 268 build raw dict payloads that are exchanged across module and persistence boundaries. Define explicit TypedDict payload schemas to keep strict typing enforceable.
As per coding guidelines: "Use frozen dataclasses or TypedDict for all data structures in Python - never raw dicts" and "Type hints must be present on ALL public functions. Pyright strict mode must pass".
Also applies to: 268-275
| #: Schema version embedded in every receipt. Bumped only on breaking changes. | ||
| RECEIPT_SCHEMA_VERSION: str = "1.0.0" | ||
|
|
||
|
|
||
| #: Default number of trailing audit entries captured in the receipt. The | ||
| #: window is fixed so two receipts assembled from the same chain prefix | ||
| #: are byte-identical regardless of how much audit history exists. | ||
| DEFAULT_RECEIPT_AUDIT_WINDOW: int = 16 | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Move new constants to core/defaults.py.
Line 85, Line 91, and Line 315 introduce new constants directly in this module. Centralizing these in bernstein.core.defaults keeps configuration ownership consistent and avoids drift.
As per coding guidelines: "New constants must go in core/defaults.py, not inline in other modules".
Also applies to: 315-323
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/bernstein/core/orchestration/supervisor_receipt.py` around lines 84 - 92,
RECEIPT_SCHEMA_VERSION and DEFAULT_RECEIPT_AUDIT_WINDOW (and the other constants
introduced at lines ~315-323) should be moved out of supervisor_receipt.py into
the central constants module bernstein.core.defaults; create or add them in
defaults.py with the same names and values, remove their definitions from
supervisor_receipt.py, and update supervisor_receipt.py to import those names
from bernstein.core.defaults (e.g., from bernstein.core.defaults import
RECEIPT_SCHEMA_VERSION, DEFAULT_RECEIPT_AUDIT_WINDOW) so the module uses the
centralized defaults instead of local constants.
| audit_entries: tuple[dict[str, Any], ...] | ||
| identity: IdentityTokens | ||
| prev_chain_digest: str | ||
| payload_digest: str | ||
| respawn_budget_remaining: int = 0 | ||
| signature_b64: str = "" | ||
| details: dict[str, Any] = field(default_factory=dict[str, Any]) | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
Replace raw dict-based receipt contracts with TypedDict models.
Lines 213-220 and the public function signatures at Line 268, Line 340, Line 467, and Line 555 expose ad-hoc dict[str, Any] payloads. This weakens strict typing on a cross-module contract surface.
As per coding guidelines: "Use frozen dataclasses or TypedDict for all data structures in Python - never raw dicts" and "Type hints must be present on ALL public functions. Pyright strict mode must pass".
Also applies to: 268-272, 340-343, 467-468, 555-567
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/bernstein/core/orchestration/supervisor_receipt.py` around lines 213 -
220, Replace the raw dict-based contract surface by introducing explicit
TypedDict models for receipt-related shapes (e.g., AuditEntryTypedDict,
ReceiptDetailsTypedDict, SupervisorReceiptTypedDict) and use them instead of
dict[str, Any] on the SupervisorReceipt fields (audit_entries, details) and on
all public functions that currently accept/return ad-hoc dicts; update the
dataclass fields (audit_entries: tuple[AuditEntryTypedDict, ...], details:
ReceiptDetailsTypedDict) and change function signatures that accept
payload/receipt dicts to use these TypedDict types, ensure exported types are
frozen/readonly where appropriate, add the new imports/aliases, and
run/typecheck until Pyright strict mode passes so all public functions
referenced in this file use the new TypedDict types instead of raw dicts.
| def _seed_runtime_tree(workdir: Path, *, session_id: str, role: str = "manager") -> None: | ||
| """Populate a minimal ``.sdd/runtime/`` tree that mimics a stalled session.""" | ||
| runtime = workdir / ".sdd" / "runtime" | ||
| (runtime / "heartbeats").mkdir(parents=True, exist_ok=True) | ||
| (runtime / "failures").mkdir(parents=True, exist_ok=True) | ||
| (runtime / "spawn_supervisor").mkdir(parents=True, exist_ok=True) | ||
|
|
||
| # agents.json: one live worker, role=manager, with a stalled diagnostic. | ||
| agents_doc = { | ||
| "agents": [ | ||
| { | ||
| "id": session_id, | ||
| "role": role, | ||
| "status": "working", | ||
| "task_ids": ["t-1"], | ||
| "worker_id": "abc123def456", | ||
| "worktree_id": "wt-A", | ||
| "stalled_manager": { | ||
| "kind": "stalled_manager", | ||
| "session_id": session_id, | ||
| "runtime_s": 120.0, | ||
| "hook_event_count": 12, | ||
| "detected_at": 1700000000.0, | ||
| }, | ||
| } | ||
| ] | ||
| } | ||
| (runtime / "agents.json").write_text(json.dumps(agents_doc, sort_keys=True)) | ||
|
|
||
| # Heartbeat - aged so the aggregator flags the session as stuck. | ||
| heartbeat = { | ||
| "timestamp": 1700000000.0 - 600.0, # 10 min old | ||
| "phase": "implementing", | ||
| "progress_pct": 0, | ||
| } | ||
| (runtime / "heartbeats" / f"{session_id}.json").write_text(json.dumps(heartbeat, sort_keys=True)) | ||
|
|
||
| # One failure record so the aggregator has something to feed into the | ||
| # receipt's audit_entries slice. | ||
| failure = { | ||
| "kind": "stalled_manager", | ||
| "session_id": session_id, | ||
| "runtime_s": 120.0, | ||
| "hook_event_count": 12, | ||
| "detected_at": 1700000000.0, | ||
| } | ||
| (runtime / "failures" / f"manager-stalled-{session_id}.json").write_text(json.dumps(failure, sort_keys=True)) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Consider TypedDict for runtime tree JSON documents.
The _seed_runtime_tree helper constructs several raw dict structures (agents_doc, heartbeat, failure) that simulate the JSON documents written by upstream detectors. Defining TypedDict types for these structures would improve type safety and document the expected schema:
class RuntimeAgentDoc(TypedDict):
id: str
role: str
status: str
task_ids: list[str]
worker_id: str
worktree_id: str
stalled_manager: dict[str, Any] # or further typed
class RuntimeHeartbeat(TypedDict):
timestamp: float
phase: str
progress_pct: intAs per coding guidelines: "Use frozen dataclasses or TypedDict for all data structures in Python - never raw dicts".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/integration/test_supervisor_chain_roundtrip.py` around lines 42 - 88,
Declare TypedDicts (e.g., RuntimeAgentDoc, RuntimeHeartbeat, RuntimeFailure) and
use them inside _seed_runtime_tree to type the JSON payloads instead of raw
dicts: change the types of agents_doc, heartbeat, and failure to the new
TypedDict types and construct their values to conform to those schemas
(including typing for stalled_manager and task_ids); ensure imports (from typing
import TypedDict, Any) are added and that json.dumps still receives
dict-compatible structures so file writes remain unchanged.
| def _audit_slice(*, with_fatal: bool = False) -> list[dict[str, str]]: | ||
| base = [ | ||
| {"event_type": "session.start", "session_id": "sess-1"}, | ||
| {"event_type": "task.pre_spawn", "session_id": "sess-1"}, | ||
| {"event_type": "heartbeat.tick", "session_id": "sess-1"}, | ||
| ] | ||
| if with_fatal: | ||
| base.append({"event_type": "auth.denied", "session_id": "sess-1"}) | ||
| return base |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Consider TypedDict for audit entry test fixtures.
The guideline states "never raw dicts" even in test code. While these fixtures simulate JSON input from the runtime system, annotating _audit_slice() return type with a TypedDict would improve type safety without changing the dict literal syntax:
from typing import TypedDict
class AuditEntryFixture(TypedDict, total=False):
event_type: str
session_id: str
details: dict[str, Any]
def _audit_slice(*, with_fatal: bool = False) -> list[AuditEntryFixture]:
...As per coding guidelines: "Use frozen dataclasses or TypedDict for all data structures in Python - never raw dicts".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unit/test_supervisor_receipt.py` around lines 40 - 48, Replace the raw
dict return type of _audit_slice with a TypedDict for type safety: define a
TypedDict (e.g., AuditEntryFixture) that includes event_type: str, session_id:
str and optional details: dict[str, Any], import TypedDict and Any, then change
the _audit_slice signature to return list[AuditEntryFixture] and construct the
same dict literals (they will be type-checked as AuditEntryFixture) so tests
keep the same behavior while following the "never raw dicts" guideline.
- Validate `--reason` strips to non-empty before any state mutation in `bernstein supervisor escalate`. A whitespace-only reason previously passed through into the receipt body. - Log silent-broad-exception paths in the supervisor summary helpers (status + fleet) and in the install-fingerprint lookup so an empty summary surfaces a cause in the orchestrator log instead of looking identical to a healthy run. - Refuse to silently reset the audit chain anchor when the audit log directory exists but is unreadable. The previous fallback to the genesis sentinel would have let a fresh receipt skip the chain head and break the tamper-evidence guarantee. - Switch receipt filenames to nanosecond timestamps and open the file exclusively (``"x"``) so two escalations in the same second cannot silently overwrite each other. - Escape every dynamic field interpolated into the supervisor TUI pane's Rich markup; an upstream caller passing ``[``/``]`` in a worker id or role no longer corrupts the dashboard layout. bot-ack: 3284009273 bot-ack: 3284009276 bot-ack: 3284009287 bot-ack: 3284009299 bot-ack: 3284009302 bot-ack: 3284009305 bot-ack: 3284009328
| hb = f"{int(row.last_heartbeat_age_s)}s" | ||
| stuck_label = "STUCK" if row.is_stuck else "ok" | ||
| table.add_row( | ||
| f"{row.worker_id}", |
| if not isinstance(payload_any, dict): | ||
| return [] | ||
| payload = cast(dict[str, Any], payload_any) | ||
| raw = payload.get("agents") |
|
|
||
| # Audit slice for the deterministic recommended action: failures | ||
| # already classified for this session. | ||
| slice_entries: list[dict[str, Any]] = [] |
| failures = 0 | ||
| for entry in audit_entries: | ||
| event_type = str(entry.get("event_type", "")).lower() | ||
| if event_type.endswith(".failed") or event_type.endswith(".error") or event_type.endswith(".errored"): |
| failures = 0 | ||
| for entry in audit_entries: | ||
| event_type = str(entry.get("event_type", "")).lower() | ||
| if event_type.endswith(".failed") or event_type.endswith(".error") or event_type.endswith(".errored"): |
Closes #1800.
Summary
bernstein supervisor status [--json]andbernstein supervisor escalate <session_id> --reason "..."as an operator-facing surface over the existingstalled_manager,watchdog, andspawn_supervisordetectors. The detectors remain the source of truth; the new command aggregates and renders.(worker_id, worktree_id, last_n_audit_entries, identity_tokens, stall_reason, recommended_action, prev_chain_digest). Receipts verify offline against the install's public key.recommended_actionis a pure function of the chain slice at stall time. Two operators verifying the same receipt land on the byte-identical recommendation. A cross-worktree fence refuses receipt assembly when the stuck session leaked into a sibling worktree's resolution events.bernstein statusandbernstein fleetgain a stuck-count + oldest-stall summary line. The TUI dashboard adds aSupervisorPanehighlighting stalled / parked / no-progress sessions with the same recommended action.Files touched
src/bernstein/core/orchestration/supervisor_receipt.py(new) - receipt assembly, signing, verification, deterministic recommended_action, cross-worktree fence.src/bernstein/core/orchestration/supervisor_aggregator.py(new) - reads.sdd/runtime/agents.json, heartbeats, and failure records into a single snapshot.src/bernstein/cli/commands/supervisor_cmd.py(new) -bernstein supervisor status/escalateClick group.src/bernstein/cli/main.py- register the new group.src/bernstein/cli/commands/status_cmd.py- attach supervisor summary tobernstein status.src/bernstein/cli/commands/fleet_cmd.py- append supervisor summary line to the fallback fleet renderer.src/bernstein/core/lifecycle/hooks.py- addworker.escalatedlifecycle event for external notifiers.src/bernstein/tui/status_bar.py- addSupervisorPanewidget +render_supervisor_panehelper.src/bernstein/tui/worker_badges.py- addWorkerStatus.STUCK+status_for_supervisor_rowhelper.docs/api/supervisor.md(new) - JSON schema for aggregator snapshot + receipt envelope, plus the determinism contract and cross-worktree fence rules.tests/unit/test_supervisor_receipt.py(new) - 21 tests covering receipt assembly, signature verification, recommended-action determinism (drives the same chain slice from two temp dirs), fence assertion, dict roundtrip.tests/integration/test_supervisor_chain_roundtrip.py(new) - end-to-end: synthetic.sdd/runtime/-> aggregator -> signed receipt -> JSON roundtrip -> verification.tests/snapshot/test_supervisor_pane_snapshot.py(new) - TUI render snapshot for the new pane.Acceptance criteria
bernstein supervisor status [--json]lists every live worker with last-heartbeat, current task id, stall reason, recommended action, respawn budget remaining.recommended_actionis deterministic over(stall_reason, audit_entries, respawn_budget_remaining).bernstein supervisor escalaterecords an escalation event in the audit chain and firesworker.escalated.bernstein statusandbernstein fleetoutput.stalled_manager,watchdog,spawn_supervisor, and parked-agent surfaces remain the source of truth; the new command aggregates.docs/api/supervisor.md.Test plan
uv run pytest tests/unit/test_supervisor_receipt.py tests/integration/test_supervisor_chain_roundtrip.py tests/snapshot/test_supervisor_pane_snapshot.py- 25 passed.uv run pytest tests/unit/ -k "supervisor or stalled or watchdog"- 127 passed, 45 skipped, no regressions.uv run ruff check .- all checks passed.uv run ruff format --check .- all files formatted.uv run pyrighton the three new modules - 0 errors.bernstein supervisor --helpandbernstein supervisor status --jsonsmoke-tested.Summary by CodeRabbit
New Features
statusandescalatecommands for inspecting and escalating stuck workers.statusnow includes a supervisor summary in JSON and a dim one-line supervisor summary in human output.Tests