Skip to content

Fix SSH control master cleanup on remote teardown#2104

Merged
lawrencecchen merged 12 commits intomainfrom
feat-ssh-session-end-controlmaster-cleanup
Mar 25, 2026
Merged

Fix SSH control master cleanup on remote teardown#2104
lawrencecchen merged 12 commits intomainfrom
feat-ssh-session-end-controlmaster-cleanup

Conversation

@lawrencecchen
Copy link
Copy Markdown
Contributor

@lawrencecchen lawrencecchen commented Mar 25, 2026

Summary

  • add regression tests for SSH control-master cleanup and browser-panel retention
  • close the local SSH control master whenever a remote workspace clears its remote configuration
  • cover the fast-exit case where the SSH surface ends while the workspace is still connecting

Verification

  • xcodebuild -project GhosttyTabs.xcodeproj -scheme cmux-unit -destination 'platform=macOS' -derivedDataPath /tmp/cmux-feat-ssh-session-end-controlmaster-cleanup-unit4 build-for-testing -only-testing:cmuxTests/WorkspaceRemoteConnectionTests/testRemoteTerminalSessionEndRequestsControlMasterCleanupWhenWorkspaceDemotes -only-testing:cmuxTests/WorkspaceRemoteConnectionTests/testTeardownRemoteConnectionRequestsControlMasterCleanupWhileStillConnecting -only-testing:cmuxTests/WorkspaceRemoteConnectionTests/testRemoteTerminalSessionEndSkipsControlMasterCleanupWhenBrowserPanelsKeepWorkspaceRemote\n- ./scripts/reload.sh --tag feat-ssh-session-end-controlmaster-cleanup\n- runtime probe: cmux ssh cmux-macmini --name cleanup-probe -- sh -lc 'exit 0' followed by checking the relay files on cmux-macmini and confirming pgrep -fal /tmp/cmux-ssh-<uid>-<relayPort> returns nothing\n\n## Notes\n- cmux-unit test still hits the existing local AppKit test-host crash here, so I used build-for-testing plus the tagged runtime repro for fresh evidence.

Summary by cubic

Closes the local SSH ControlMaster on remote teardown or demotion, and keeps the workspace by demoting to local when the last SSH panel exits. Handles connecting/ended sessions and skips cleanup during remote detach or when browser panels keep the workspace remote to prevent orphaned sockets and accidental closes.

  • Bug Fixes
    • Run ssh -O exit in background with BatchMode=yes; ignore ControlMaster/ControlPersist; keep -p/-i; include ControlPath if set; discard stdio.
    • Demote on last-SSH child-exit, including when the session already ended or while connecting.
    • Skip ControlMaster cleanup during remote detach and when browser panels keep the workspace remote.
    • Added regression tests and a runSSHControlMasterCommandOverrideForTesting hook; works without explicit ControlPath.

Written for commit a24f90a. Summary will update on new commits.

Summary by CodeRabbit

  • Bug Fixes

    • Closing the last remote terminal panel now preserves the workspace, demotes it to local, and routes the close via the remote-surface path when appropriate.
    • Workspace demotion is deferred until remote terminal child exits are acknowledged, avoiding premature collapse.
    • SSH ControlMaster cleanup is now conditional and runs only in appropriate disconnect/teardown cases; skipped when other remote surfaces keep the workspace remote.
    • Improved debug logging for remote-close flows.
  • Tests

    • Added tests covering last-remote-panel close, demotion behavior, and conditional SSH ControlMaster cleanup.

@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
cmux Ready Ready Preview, Comment Mar 25, 2026 6:48am

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 25, 2026

Caution

Review failed

Pull request was closed or merged during review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7dac2982-e3b3-4da3-93c9-0919d553e4ef

📥 Commits

Reviewing files that changed from the base of the PR and between 395f564 and a24f90a.

📒 Files selected for processing (2)
  • Sources/Workspace.swift
  • cmuxTests/WorkspaceRemoteConnectionTests.swift
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmuxTests/WorkspaceRemoteConnectionTests.swift

📝 Walkthrough

Walkthrough

TabManager routes child-exit closes for the SSH "last" panel through closeRuntimeSurface(...) and returns early to preserve/demote the workspace; Workspace tracks pendingRemoteTerminalChildExitSurfaceIds, exposes shouldDemoteWorkspaceAfterChildExit(surfaceId:), and may request a best-effort SSH ControlMaster -O exit cleanup via a testable hook.

Changes

Cohort / File(s) Summary
TabManager remote close logic
Sources/TabManager.swift
Adds keepsRemoteWorkspaceOpen check in closePanelAfterChildExited(tabId:surfaceId:); when true for the SSH-last-surface case logs remote flags, calls closeRuntimeSurface(...), and returns early (bypasses collapse/close-window path).
Workspace SSH & demotion state
Sources/Workspace.swift
Adds pendingRemoteTerminalChildExitSurfaceIds, @MainActor func shouldDemoteWorkspaceAfterChildExit(surfaceId:), integrates pending-ID lifecycle into track/untrack/teardown paths, changes disconnectRemoteConnection(clearConfiguration:) to avoid ControlMaster cleanup in detach-only flows, and adds SSH ControlMaster cleanup helpers plus a nonisolated test override hook.
TabManager tests (child-exit remote)
cmuxTests/TabManagerUnitTests.swift
Adds two tests verifying closing the last remote panel after child-exit keeps the tab/workspace, demotes to local, removes the remote panel, moves focus, and clears active remote session count.
Workspace SSH cleanup tests
cmuxTests/WorkspaceRemoteConnectionTests.swift
Adds five tests asserting SSH ControlMaster cleanup invocation (ssh ... -O exit with proper options) in demotion scenarios and asserting cleanup is skipped when detaching or when other surfaces keep the workspace remote.

Sequence Diagram(s)

sequenceDiagram
  participant TM as TabManager
  participant WS as Workspace
  participant SSH as SSH Process (/usr/bin/ssh)

  TM->>WS: closePanelAfterChildExited(tabId, surfaceId)
  alt keepsRemoteWorkspaceOpen && surface is SSH-last
    WS->>WS: shouldDemoteWorkspaceAfterChildExit(surfaceId) == true
    TM->>WS: closeRuntimeSurface(tabId, surfaceId)
    WS->>WS: demote workspace state (clear remote flags)
    WS->>SSH: requestSSHControlMasterCleanupIfNeeded(args)
    Note right of SSH: run `ssh ... -O exit` (async) or call test override
  else normal child-exit close
    TM->>WS: existing collapse / close-window flow
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through panels, soft and spry,

A last SSH whisper waved goodbye.
I kept the burrow, tucked the key,
Demoted rooms, set processes free.
Hop-hop — tidy code and glee.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description includes a concise summary of changes and testing details, but omits the standard template sections: it lacks a clear 'Why' explanation, no demo video section, no review trigger block, and no completion checklist. Add 'Why' context explaining the motivation, include the review trigger block, and complete the standard checklist to match the repository template.
Docstring Coverage ⚠️ Warning Docstring coverage is 6.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix SSH control master cleanup on remote teardown' directly and specifically describes the main change: implementing SSH ControlMaster cleanup when remote workspaces tear down or are demoted to local.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat-ssh-session-end-controlmaster-cleanup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 25, 2026

Greptile Summary

This PR closes the SSH control-master socket whenever a remote workspace is torn down or demoted by capturing remoteConfiguration before it is cleared in disconnectRemoteConnection and calling requestSSHControlMasterCleanupIfNeeded at the end of that method. Three regression tests are added to cover the demotion, explicit teardown, and browser-panel-retention paths.

Key points:

  • Production logic is correct: configurationForCleanup is snapshotted before remoteConfiguration is set to nil, arguments are assembled safely, and the ssh -O exit call is dispatched on a dedicated utility queue.
  • Test assertion bug: testTeardownRemoteConnectionRequestsControlMasterCleanupWhileStillConnecting includes a StrictHostKeyChecking pair in its expected argument array, but that option is absent from the test config's sshOptions. The XCTAssertEqual will fail as written.
  • Minor style concern: The stderr Pipe() in requestSSHControlMasterCleanupIfNeeded is never read; routing it to FileHandle.nullDevice avoids a theoretical deadlock if ssh produces unexpectedly large output.

Confidence Score: 3/5

  • Safe to merge after fixing the incorrect test assertion — the production logic is sound but one test will fail as written.
  • The core cleanup logic in Workspace.swift is correct and well-structured. However, the test added to verify the teardown-while-connecting path contains a wrong expected-argument array (a stray SSH option that isn't in the config), meaning that test will fail immediately. This is a new regression test introduced by the PR itself, so it directly undermines the stated verification goals.
  • cmuxTests/WorkspaceRemoteConnectionTests.swift — fix the expected-arguments array in testTeardownRemoteConnectionRequestsControlMasterCleanupWhileStillConnecting.

Important Files Changed

Filename Overview
Sources/Workspace.swift Adds SSH control-master cleanup on remote teardown: captures configuration before clearing it, calls requestSSHControlMasterCleanupIfNeeded at the end of disconnectRemoteConnection, and implements the argument-building + async process-launch helpers. Logic is sound; minor concern with unread stderr Pipe.
cmuxTests/WorkspaceRemoteConnectionTests.swift Three regression tests for SSH control-master cleanup. The teardown-while-connecting test has an incorrect expected-argument array containing a StrictHostKeyChecking entry absent from the test config, so that XCTAssertEqual will fail.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Workspace
    participant CleanupQueue as sshControlMasterCleanupQueue
    participant SSH as /usr/bin/ssh

    Caller->>Workspace: disconnectRemoteConnection(clearConfiguration: true)
    Workspace->>Workspace: configurationForCleanup = remoteConfiguration
    Workspace->>Workspace: remoteConfiguration = nil
    Workspace->>Workspace: applyBrowserRemoteWorkspaceStatusToPanels()
    Workspace->>Workspace: recomputeListeningPorts()
    Workspace->>Workspace: requestSSHControlMasterCleanupIfNeeded(configuration)
    alt test override set
        Workspace->>Caller: runSSHControlMasterCommandOverrideForTesting(arguments)
    else production path
        Workspace->>CleanupQueue: async { ssh -O exit ... }
        CleanupQueue->>SSH: launch process
        SSH-->>CleanupQueue: exit (control master closed)
    end
Loading

Reviews (1): Last reviewed commit: "fix: close SSH control master on remote ..." | Re-trigger Greptile

Comment thread Sources/Workspace.swift
process.executableURL = URL(fileURLWithPath: "/usr/bin/ssh")
process.arguments = arguments
process.standardInput = FileHandle.nullDevice
process.standardOutput = FileHandle.nullDevice
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unread stderr Pipe may deadlock

process.standardError is assigned a Pipe() but its read end is never consumed. If ssh -O exit writes more data to stderr than the kernel pipe buffer (~64 KB on macOS), the ssh process will block in its write() syscall and waitUntilExit() will never return, hanging the cleanup queue worker.

Since the error output isn't needed for correctness here, redirecting to FileHandle.nullDevice is the safer choice:

Suggested change
process.standardOutput = FileHandle.nullDevice
process.standardError = FileHandle.nullDevice

"-o", "BatchMode=yes",
"-o", "ControlMaster=no",
"-o", "ControlPath=/tmp/cmux-ssh-%C",
"-o", "StrictHostKeyChecking=accept-new",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Expected argument mismatch — StrictHostKeyChecking not in this test's config

This test's sshOptions does not include a StrictHostKeyChecking entry, but the expected-arguments array on this line asserts that one will be present in the captured output. sshControlMasterCleanupArguments only emits options that appear in the config, so this pair will never be produced and the XCTAssertEqual will fail.

The entry was likely carried over by copy-paste from the first test, which does include that option in its sshOptions. Remove this line (and its paired -o flag above it) from the expected array.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 502b677634

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

"-o", "BatchMode=yes",
"-o", "ControlMaster=no",
"-o", "ControlPath=/tmp/cmux-ssh-%C",
"-o", "StrictHostKeyChecking=accept-new",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fix teardown assertion to match configured SSH options

This test config does not include a StrictHostKeyChecking SSH option, but the expected cleanup argv still requires "-o", "StrictHostKeyChecking=accept-new". In sshControlMasterCleanupArguments, options are built from configuration.sshOptions, so this expectation is impossible for the provided input and the regression test will fail once executed, obscuring whether the teardown cleanup behavior is actually correct.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f241aa00e5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread Sources/Workspace.swift Outdated
Comment on lines +6709 to +6713
process.standardError = Pipe()

do {
try process.run()
process.waitUntilExit()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Redirect or consume ssh stderr before waitUntilExit

This cleanup path sets process.standardError to a Pipe and then blocks on waitUntilExit() without reading from that pipe. If ssh -O exit emits enough stderr (for example from verbose or misconfigured SSH options), the child can block on a full pipe buffer and never exit, which then stalls the serial cleanup queue and prevents later teardown cleanups from running.

Useful? React with 👍 / 👎.

Comment thread Sources/Workspace.swift Outdated
Comment on lines +6722 to +6723
guard sshOptions.contains(where: { sshOptionKeyForControlCleanup($0) == "controlpath" }) else {
return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don’t skip cleanup when ControlPath comes from ssh config

sshControlMasterCleanupArguments returns nil unless configuration.sshOptions explicitly contains ControlPath, so teardown silently skips cleanup for sessions where the control socket is configured via ~/.ssh/config (or any caller that omits ControlPath from ssh_options). In that case remote teardown still clears workspace state, but the local ControlMaster process/socket is left orphaned.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
Sources/Workspace.swift (1)

6720-6762: Extract SSH option normalization into one shared helper.

This is now the third copy of the trim/key/filter logic in this file. Pulling it into one fileprivate utility would keep cleanup, bootstrap, and relay behavior from drifting apart.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Sources/Workspace.swift` around lines 6720 - 6762, Extract the repeated SSH
option trimming/key/filter logic into a single fileprivate utility (e.g.,
fileprivate func normalizeSSHOptions(_:) and fileprivate func sshOptionKey(_:) )
and replace the existing implementations used by
sshControlMasterCleanupArguments, normalizedSSHControlCleanupOptions, and
sshOptionKeyForControlCleanup to call the new helpers; update the
cleanup/bootstrap/relay callers to use the shared helpers so trimming, key
extraction, and disallowed-key filtering (currently done via disallowedKeys =
["controlmaster","controlpersist"]) are centralized and the duplicated logic is
removed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Sources/Workspace.swift`:
- Around line 6703-6717: The sshControlMasterCleanupQueue task can block forever
because Process.standardError is a Pipe with no reader and waitUntilExit() can
hang; change the cleanup to avoid a blocking wait by either redirecting
standardError to FileHandle.nullDevice or attach a
terminationHandler/onBackgroundThread to observe process termination instead of
calling waitUntilExit(), and ensure you still call try process.run() inside the
sshControlMasterCleanupQueue closure (refer to sshControlMasterCleanupQueue,
Process, process.run(), process.waitUntilExit(), and process.standardError) so
the queue never wedges on a single noisy/stuck ssh process.

---

Nitpick comments:
In `@Sources/Workspace.swift`:
- Around line 6720-6762: Extract the repeated SSH option trimming/key/filter
logic into a single fileprivate utility (e.g., fileprivate func
normalizeSSHOptions(_:) and fileprivate func sshOptionKey(_:) ) and replace the
existing implementations used by sshControlMasterCleanupArguments,
normalizedSSHControlCleanupOptions, and sshOptionKeyForControlCleanup to call
the new helpers; update the cleanup/bootstrap/relay callers to use the shared
helpers so trimming, key extraction, and disallowed-key filtering (currently
done via disallowedKeys = ["controlmaster","controlpersist"]) are centralized
and the duplicated logic is removed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d2fb0e1f-789d-4634-b13f-fb309da11735

📥 Commits

Reviewing files that changed from the base of the PR and between 960006e and f241aa0.

📒 Files selected for processing (4)
  • Sources/TabManager.swift
  • Sources/Workspace.swift
  • cmuxTests/TabManagerUnitTests.swift
  • cmuxTests/WorkspaceRemoteConnectionTests.swift

Comment thread Sources/Workspace.swift
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
Sources/Workspace.swift (1)

6711-6722: ⚠️ Potential issue | 🟠 Major

Cleanup worker can still wedge the serial queue.

process.standardError = Pipe() without a reader plus blocking waitUntilExit() can stall this queue indefinitely and block all later ControlMaster cleanup requests.

Suggested fix
         sshControlMasterCleanupQueue.async {
             let process = Process()
             process.executableURL = URL(fileURLWithPath: "/usr/bin/ssh")
             process.arguments = arguments
             process.standardInput = FileHandle.nullDevice
             process.standardOutput = FileHandle.nullDevice
-            process.standardError = Pipe()
+            process.standardError = FileHandle.nullDevice
+            let exitSemaphore = DispatchSemaphore(value: 0)
+            process.terminationHandler = { _ in
+                exitSemaphore.signal()
+            }

             do {
                 try process.run()
-                process.waitUntilExit()
+                if exitSemaphore.wait(timeout: .now() + 5.0) == .timedOut, process.isRunning {
+                    process.terminate()
+                }
             } catch {
                 return
             }
         }
#!/bin/bash
# Verify the problematic pattern still exists in Sources/Workspace.swift
rg -n -C2 'process\.standardError = Pipe\(\)|process\.waitUntilExit\(\)' Sources/Workspace.swift
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Sources/Workspace.swift` around lines 6711 - 6722, The cleanup task can wedge
sshControlMasterCleanupQueue because process.standardError = Pipe() has no
reader and waitUntilExit() blocks the serial queue; change the code in the
sshControlMasterCleanupQueue.async block so the Process does not create an
unread Pipe and does not synchronously wait: either set process.standardError =
FileHandle.nullDevice or, if you need stderr, create a Pipe and asynchronously
read its fileHandleForReading on a background queue, then replace the blocking
try { process.run(); process.waitUntilExit() } with try process.run() plus a
process.terminationHandler { _ in /* handle exit */ } so the queue is not
blocked waiting for the child to exit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@Sources/Workspace.swift`:
- Around line 6711-6722: The cleanup task can wedge sshControlMasterCleanupQueue
because process.standardError = Pipe() has no reader and waitUntilExit() blocks
the serial queue; change the code in the sshControlMasterCleanupQueue.async
block so the Process does not create an unread Pipe and does not synchronously
wait: either set process.standardError = FileHandle.nullDevice or, if you need
stderr, create a Pipe and asynchronously read its fileHandleForReading on a
background queue, then replace the blocking try { process.run();
process.waitUntilExit() } with try process.run() plus a
process.terminationHandler { _ in /* handle exit */ } so the queue is not
blocked waiting for the child to exit.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 05991187-9047-4b06-83f2-827c5de44712

📥 Commits

Reviewing files that changed from the base of the PR and between f241aa0 and 94458a2.

📒 Files selected for processing (3)
  • Sources/TabManager.swift
  • Sources/Workspace.swift
  • cmuxTests/TabManagerUnitTests.swift
✅ Files skipped from review due to trivial changes (1)
  • cmuxTests/TabManagerUnitTests.swift

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Sources/Workspace.swift (1)

6617-6647: ⚠️ Potential issue | 🟠 Major

Skip ControlMaster cleanup while the last remote panel is only being detached.

When the last remote terminal is detached from a workspace, closePanel() stores it in pendingDetachedSurfaces for later attachment elsewhere, then calls clearRemoteConfigurationIfWorkspaceBecameLocal(). If the workspace becomes empty, this triggers disconnectRemoteConnection(clearConfiguration: true), which schedules ssh -O exit at line 6645. The detached panel's active SSH session is still needed after attachment, so the ControlMaster should not be torn down during a detach transaction.

🛠️ Minimal guard in this path
     func disconnectRemoteConnection(clearConfiguration: Bool = false) {
-        let configurationForCleanup = clearConfiguration ? remoteConfiguration : nil
+        let shouldCleanupControlMaster =
+            clearConfiguration && !isDetachingCloseTransaction
+        let configurationForCleanup = shouldCleanupControlMaster ? remoteConfiguration : nil
         let previousController = remoteSessionController
         activeRemoteSessionControllerID = nil
         remoteSessionController = nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Sources/Workspace.swift` around lines 6617 - 6647, disconnectRemoteConnection
currently always schedules SSH ControlMaster cleanup when clearConfiguration is
true, which tears down an SSH master even if the last remote terminal was only
detached for reattachment; change the final cleanup call so it only runs when
there are no pending detached panels: in disconnectRemoteConnection(...) guard
on pendingDetachedSurfaces.isEmpty before calling
Self.requestSSHControlMasterCleanupIfNeeded(configuration:), referencing the
pendingDetachedSurfaces collection and the
requestSSHControlMasterCleanupIfNeeded(...) helper so a detach/reattach
transaction preserves the active ControlMaster; no other behavior changes.
♻️ Duplicate comments (1)
Sources/Workspace.swift (1)

6711-6725: ⚠️ Potential issue | 🟠 Major

Keep the cleanup worker bounded.

Line 6721 can still block the serial sshControlMasterCleanupQueue indefinitely if one ssh -O exit wedges, so every later cleanup request stalls behind it. This is the same queue-wedge risk called out earlier; the stderr pipe fix is in place now, but the unbounded waitUntilExit() still reintroduces it.

🛠️ Bound the wait instead of blocking forever
         sshControlMasterCleanupQueue.async {
             let process = Process()
             process.executableURL = URL(fileURLWithPath: "/usr/bin/ssh")
             process.arguments = arguments
             process.standardInput = FileHandle.nullDevice
             process.standardOutput = FileHandle.nullDevice
             process.standardError = FileHandle.nullDevice
+            let exitSemaphore = DispatchSemaphore(value: 0)
+            process.terminationHandler = { _ in
+                exitSemaphore.signal()
+            }

             do {
                 try process.run()
-                process.waitUntilExit()
+                if exitSemaphore.wait(timeout: .now() + 5) == .timedOut, process.isRunning {
+                    process.terminate()
+                }
             } catch {
                 return
             }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Sources/Workspace.swift` around lines 6711 - 6725, The cleanup closure
currently calls process.run() then process.waitUntilExit() on the serial
sshControlMasterCleanupQueue which can wedge the queue; replace the blocking
wait with a non‑blocking pattern: after try process.run(), do not call
process.waitUntilExit(); instead set process.terminationHandler to handle
completion and return immediately from the queue closure, and schedule a timeout
(e.g., DispatchQueue.global().asyncAfter) that checks process.isRunning and
calls process.terminate() if it exceeds the bound. Locate the code around
sshControlMasterCleanupQueue, the Process() instance and the lines that call
process.run() and process.waitUntilExit(), and implement the terminationHandler
+ timeout + process.terminate() approach so the serial queue never blocks
indefinitely.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@Sources/Workspace.swift`:
- Around line 6617-6647: disconnectRemoteConnection currently always schedules
SSH ControlMaster cleanup when clearConfiguration is true, which tears down an
SSH master even if the last remote terminal was only detached for reattachment;
change the final cleanup call so it only runs when there are no pending detached
panels: in disconnectRemoteConnection(...) guard on
pendingDetachedSurfaces.isEmpty before calling
Self.requestSSHControlMasterCleanupIfNeeded(configuration:), referencing the
pendingDetachedSurfaces collection and the
requestSSHControlMasterCleanupIfNeeded(...) helper so a detach/reattach
transaction preserves the active ControlMaster; no other behavior changes.

---

Duplicate comments:
In `@Sources/Workspace.swift`:
- Around line 6711-6725: The cleanup closure currently calls process.run() then
process.waitUntilExit() on the serial sshControlMasterCleanupQueue which can
wedge the queue; replace the blocking wait with a non‑blocking pattern: after
try process.run(), do not call process.waitUntilExit(); instead set
process.terminationHandler to handle completion and return immediately from the
queue closure, and schedule a timeout (e.g., DispatchQueue.global().asyncAfter)
that checks process.isRunning and calls process.terminate() if it exceeds the
bound. Locate the code around sshControlMasterCleanupQueue, the Process()
instance and the lines that call process.run() and process.waitUntilExit(), and
implement the terminationHandler + timeout + process.terminate() approach so the
serial queue never blocks indefinitely.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fa9d66f8-76fd-4a4a-9a40-453280006cd3

📥 Commits

Reviewing files that changed from the base of the PR and between 94458a2 and 3a45322.

📒 Files selected for processing (2)
  • Sources/Workspace.swift
  • cmuxTests/WorkspaceRemoteConnectionTests.swift
✅ Files skipped from review due to trivial changes (1)
  • cmuxTests/WorkspaceRemoteConnectionTests.swift

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
Sources/Workspace.swift (1)

6711-6721: ⚠️ Potential issue | 🟠 Major

Bound the serial cleanup worker.

sshControlMasterCleanupQueue is serial, and process.waitUntilExit() has no timeout. One stuck ssh -O exit will block every later ControlMaster cleanup request on that queue.

🛠️ One way to keep the worker bounded
         sshControlMasterCleanupQueue.async {
             let process = Process()
             process.executableURL = URL(fileURLWithPath: "/usr/bin/ssh")
             process.arguments = arguments
             process.standardInput = FileHandle.nullDevice
             process.standardOutput = FileHandle.nullDevice
             process.standardError = FileHandle.nullDevice
+            let exitSemaphore = DispatchSemaphore(value: 0)
+            process.terminationHandler = { _ in
+                exitSemaphore.signal()
+            }

             do {
                 try process.run()
-                process.waitUntilExit()
+                if exitSemaphore.wait(timeout: .now() + 5) == .timedOut, process.isRunning {
+                    process.terminate()
+                    if exitSemaphore.wait(timeout: .now() + 1) == .timedOut, process.isRunning {
+                        _ = Darwin.kill(process.processIdentifier, SIGKILL)
+                    }
+                }
             } catch {
                 return
             }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Sources/Workspace.swift` around lines 6711 - 6721, The serial cleanup worker
uses sshControlMasterCleanupQueue and blocks on process.waitUntilExit(), so a
stuck "ssh -O exit" can stall the entire queue; modify the cleanup task around
Process.run()/process.waitUntilExit() to enforce a bounded timeout: after
starting the Process (in the sshControlMasterCleanupQueue.async block) start a
short timer (or use DispatchSemaphore/DispatchGroup with wait(timeout:)) and if
the wait times out, call process.terminate() (and if needed process.kill via
SIGKILL) and then wait for exit with a short grace period, and always return
from the queue task—ensure the change touches the code that creates Process,
calls try process.run(), and currently uses process.waitUntilExit() so that
waiting is bounded and the queue cannot be indefinitely blocked.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@Sources/Workspace.swift`:
- Around line 6711-6721: The serial cleanup worker uses
sshControlMasterCleanupQueue and blocks on process.waitUntilExit(), so a stuck
"ssh -O exit" can stall the entire queue; modify the cleanup task around
Process.run()/process.waitUntilExit() to enforce a bounded timeout: after
starting the Process (in the sshControlMasterCleanupQueue.async block) start a
short timer (or use DispatchSemaphore/DispatchGroup with wait(timeout:)) and if
the wait times out, call process.terminate() (and if needed process.kill via
SIGKILL) and then wait for exit with a short grace period, and always return
from the queue task—ensure the change touches the code that creates Process,
calls try process.run(), and currently uses process.waitUntilExit() so that
waiting is bounded and the queue cannot be indefinitely blocked.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c8d952f0-6064-400a-a715-de1ad953f10e

📥 Commits

Reviewing files that changed from the base of the PR and between 3a45322 and 395f564.

📒 Files selected for processing (2)
  • Sources/Workspace.swift
  • cmuxTests/WorkspaceRemoteConnectionTests.swift
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmuxTests/WorkspaceRemoteConnectionTests.swift

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a24f90aec6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread Sources/Workspace.swift
Comment on lines +6618 to +6620
let shouldCleanupControlMaster =
clearConfiguration && !isDetachingCloseTransaction && pendingDetachedSurfaces.isEmpty
let configurationForCleanup = shouldCleanupControlMaster ? remoteConfiguration : nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip cleanup after moving the last remote surface away

When a remote terminal is detached to another workspace/window, cleanupEmptySourceWorkspaceAfterSurfaceMove closes the now-empty source workspace, which calls teardownRemoteConnection(). At this check, shouldCleanupControlMaster becomes true because the detach transaction has already ended and pendingDetachedSurfaces has been drained, so the source workspace still issues ssh -O exit for that host. If SSH multiplexing is in use, that can shut down the ControlMaster still serving the moved terminal and kill the session right after a successful move.

Useful? React with 👍 / 👎.

@lawrencecchen lawrencecchen merged commit a7e5050 into main Mar 25, 2026
13 of 14 checks passed
@lawrencecchen lawrencecchen deleted the feat-ssh-session-end-controlmaster-cleanup branch March 25, 2026 07:01
bn-l pushed a commit to bn-l/cmux that referenced this pull request Apr 3, 2026
* test: add SSH control master cleanup regressions

* fix: close SSH control master on remote teardown

* test: keep SSH workspace after child exit

* fix: keep SSH workspace after child exit

* fix: keep connecting SSH workspaces after child exit

* test: add SSH child-exit demotion regression

* fix: keep SSH workspace after connected shell exit

* fix: address SSH cleanup review feedback

* test: cover SSH cleanup without explicit controlpath

* fix: clean up SSH control masters without explicit controlpath

* test: cover remote detach cleanup edge cases

* fix: preserve SSH sessions during remote detach

---------

Co-authored-by: Lawrence Chen <lawrencecchen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant