Skip to content

Auto-reconnect crashed bridge processes in the background#103

Open
jancurn wants to merge 19 commits intomainfrom
claude/auto-restart-bridge-processes-oidW9
Open

Auto-reconnect crashed bridge processes in the background#103
jancurn wants to merge 19 commits intomainfrom
claude/auto-restart-bridge-processes-oidW9

Conversation

@jancurn
Copy link
Member

@jancurn jancurn commented Mar 24, 2026

Summary

  • Crashed bridge processes are now automatically restarted in the background whenever sessions are enumerated (mcpc or mcpc grep)
  • A 10-second cooldown (socket connect timeout + 5s buffer) between restart attempts prevents rapid retries, tracked via lastRestartAttemptAt in sessions.json
  • Restarts are fire-and-forget — they don't block the listing/grep command, keeping CLI response times fast

Implementation

  • Added lastRestartAttemptAt field to SessionData for cooldown tracking
  • consolidateSessions() now identifies crashed sessions eligible for restart (inside the file lock to prevent concurrent races) and returns them in sessionsToRestart
  • New autoRestartCrashedSessions() in bridge-manager.ts fires off restartBridge() calls without awaiting them
  • Both listSessionsAndAuthProfiles and grepAllSessions call autoRestartCrashedSessions after consolidation

Test plan

  • Unit tests pass (430/430)
  • Verify crashed session auto-restarts by: creating a session, killing the bridge PID, running mcpc twice (first marks crashed + triggers restart, second should show live)
  • Verify cooldown: kill bridge, run mcpc twice rapidly — second run should not trigger another restart attempt
  • Verify mcpc grep also triggers auto-restart of crashed sessions

https://claude.ai/code/session_01FJXX4xys8aMoF4iVZbkC86

jancurn and others added 10 commits March 23, 2026 22:50
Adds a new grep command that searches tools, resources, and prompts
by name and description across all active MCP sessions. Supports
regex matching (-E), case-sensitive mode (-s), and type filters
(--tools, --resources, --prompts). Available as both a top-level
command (all sessions) and session command (single session).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add recovery hints for crashed and expired sessions in session list

Show actionable hints under crashed and expired sessions in `mcpc` output,
similar to the existing hint for unauthorized sessions.

https://claude.ai/code/session_01D6ovkixxWohRHXaJP7Hh3g

* Remove extra text from crashed session hint

https://claude.ai/code/session_01D6ovkixxWohRHXaJP7Hh3g

---------

Co-authored-by: Claude <noreply@anthropic.com>
When enumerating sessions (e.g. `mcpc` or `mcpc grep`), crashed bridges
are now automatically restarted in the background without blocking the
command. A 10-second cooldown (connect timeout + 5s buffer) between
restart attempts prevents rapid retries. The lastRestartAttemptAt
timestamp is persisted in sessions.json and checked inside the file lock
to avoid concurrent restart races.

https://claude.ai/code/session_01FJXX4xys8aMoF4iVZbkC86
Avoid restarting a bridge that just crashed — wait for the cooldown
period after lastSeenAt to give the old process time to fully clean up
and prevent socket conflicts in shared-home environments.

https://claude.ai/code/session_01FJXX4xys8aMoF4iVZbkC86
Introduce two new transient session states that give users visibility
into bridge lifecycle transitions:

- 'connecting': shown during initial `mcpc connect` before bridge is ready
- 'reconnecting': shown when a crashed bridge is being auto-reconnected

Both display as yellow filled dots (●) in human output and as string
values in JSON output. Stale transient states (>10s with no PID)
automatically fall back to 'crashed'.

Also rename lastRestartAttemptAt → lastConnectionAttemptAt and
autoRestartCrashedSessions → reconnectCrashedSessions to better reflect
that this is a bridge reconnection, not a full session restart.

https://claude.ai/code/session_01FJXX4xys8aMoF4iVZbkC86
Resolve conflicts with PR #100 (grep command) and PR #105 (help fix):
- Take main's grep.ts (more features: --instructions, -m, capability-aware)
  and re-apply reconnectCrashedSessions call
- Take main's index.ts grep options
- Export DisplayStatus/getBridgeStatus (used by grep.ts) with new states
- Merge changelog entries

https://claude.ai/code/session_01FJXX4xys8aMoF4iVZbkC86
@jancurn jancurn changed the title Auto-restart crashed bridge processes in the background Auto-reconnect crashed bridge processes in the background Mar 24, 2026
claude and others added 9 commits March 24, 2026 16:53
Sessions with dead bridges may show as 'reconnecting' (auto-reconnect
in progress) instead of 'crashed', depending on timing. Update the
expired and failover tests to accept both states.

https://claude.ai/code/session_01FJXX4xys8aMoF4iVZbkC86
Move post-connection status management into the bridge process itself
rather than setting 'active' prematurely in reconnectCrashedSessions():

- Bridge sets status='active' after successful MCP connection
- Bridge detects session ID mismatch (server issued new ID instead of
  resuming) and marks session as 'expired' — prevents silently
  continuing with a different session identity
- reconnectCrashedSessions() no longer sets 'active'; it only reverts
  to 'crashed' on startup failure if bridge didn't set a terminal status

https://claude.ai/code/session_01FJXX4xys8aMoF4iVZbkC86
Include unauthorized sessions (alongside crashed) as candidates for
background auto-reconnection. When multiple sessions share the same
OAuth profile, one session refreshing the tokens allows others to
reconnect — the bridge reads from the OS keychain on startup and
picks up the refreshed tokens automatically.

https://claude.ai/code/session_01FJXX4xys8aMoF4iVZbkC86
Add a visual diagram showing all possible transitions between session
states, including the new connecting and reconnecting states. Update
the prose to reflect auto-reconnect behavior for crashed and
unauthorized sessions.

https://claude.ai/code/session_01FJXX4xys8aMoF4iVZbkC86
Two fixes to detect expired MCP sessions faster:

1. Session ID mismatch detection now catches all non-resume cases,
   including when the server doesn't return any session ID at all
   (previously only detected when both old and new IDs were present).

2. Bridge sends first keepalive ping 5s after startup instead of
   waiting the full 30s interval, catching stale sessions early.

https://claude.ai/code/session_01FJXX4xys8aMoF4iVZbkC86
Show 'reconnecting' as a single box with arrows from expired,
unauthorized, and crashed converging into it.

https://claude.ai/code/session_01FJXX4xys8aMoF4iVZbkC86
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants