Skip to content

Supervisor health: crash history, heartbeat restarts, stuck detection, startup probes #324

@alexsiri7

Description

@alexsiri7

Improve supervisor reliability and observability:

  • Persist crash history to .gc/crash-history.json for post-mortem analysis
  • Supervisor heartbeat restarts dead city controllers automatically
  • Add stuck_timeout for detecting agents in infinite thinking loops
  • Startup probe for provider session health

These ensure the supervisor recovers from failures without manual intervention.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureNew capabilitypriority/p2Medium — real problem, workaround exists

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions