Improve supervisor reliability and observability:
- Persist crash history to .gc/crash-history.json for post-mortem analysis
- Supervisor heartbeat restarts dead city controllers automatically
- Add stuck_timeout for detecting agents in infinite thinking loops
- Startup probe for provider session health
These ensure the supervisor recovers from failures without manual intervention.
Improve supervisor reliability and observability:
These ensure the supervisor recovers from failures without manual intervention.