Skip to content

fix(ssh-console): test_ssh_console_reconnect hang (#1150)#2615

Merged
rwthompsonii merged 3 commits into
NVIDIA:mainfrom
adnandnv:fix/1150-ssh-console-reconnect-test-hang
Jun 16, 2026
Merged

fix(ssh-console): test_ssh_console_reconnect hang (#1150)#2615
rwthompsonii merged 3 commits into
NVIDIA:mainfrom
adnandnv:fix/1150-ssh-console-reconnect-test-hang

Conversation

@adnandnv

Copy link
Copy Markdown
Contributor

Description

test_ssh_console_reconnect intermittently hangs in CI until the job times out.

The mock API server bound 127.0.0.1:0, dropped the socket, then re-bound the same
port for its gRPC server. Under concurrency another test could take the port in that
gap, so the re-bind failed silently and ssh-console was left retrying a dead port --
hanging the test. Fix: bind the listener once and serve on it directly.

Related issues

#1150

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Ran the ssh_console test 100 times locally. Before: 5 hangs, after: 0 hangs.

Additional Notes

adnandnv added 2 commits June 15, 2026 15:44
…hang (NVIDIA#1150)

The mock API server dropped its picked port before re-binding it for the gRPC
server, so under concurrency another test could take the port in that gap: the
re-bind failed silently and ssh-console hung retrying a dead port. Bind the
listener once and serve on it directly.

Signed-off-by: Adnan Dosani <258082943+adnandnv@users.noreply.github.com>
…connect-test-hang

Signed-off-by: Adnan Dosani <258082943+adnandnv@users.noreply.github.com>
@adnandnv adnandnv requested a review from a team as a code owner June 15, 2026 22:59
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Summary by CodeRabbit

  • Refactor
    • Restructured mock API server initialization to improve port binding stability during startup.

Walkthrough

The mock API server startup in lib.rs is refactored to bind a TcpListener upfront, derive the server address from listener.local_addr(), and use serve_with_incoming_shutdown(TcpIncoming::from(listener), ...) instead of serve_with_shutdown(addr, ...). The TcpIncoming type is imported accordingly. No public interfaces change.

Changes

Pre-bound TcpListener for mock server startup

Layer / File(s) Summary
Pre-bind TcpListener and switch to serve_with_incoming_shutdown
crates/ssh-console-mock-api-server/src/lib.rs
Imports tonic::transport::server::TcpIncoming, binds a TcpListener before computing the server address via local_addr(), and replaces serve_with_shutdown(addr, ...) with serve_with_incoming_shutdown(TcpIncoming::from(listener), ...) to hold the port continuously from selection through server startup. The shutdown channel wiring via shutdown_rx is unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the primary change: fixing the intermittent hang in test_ssh_console_reconnect by restructuring the mock server's port binding.
Description check ✅ Passed The description clearly explains the root cause of the race condition and the solution implemented, providing concrete context for the changes made.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/ssh-console-mock-api-server/src/lib.rs (1)

103-107: ⚡ Quick win

Unnecessary address conversion chain.

listener.local_addr() already returns SocketAddr. The subsequent .to_socket_addrs()?.next().expect(...) is redundant—SocketAddr::to_socket_addrs() merely wraps itself in a single-element iterator, making this conversion a no-op that obscures intent.

♻️ Simplify address extraction
         let listener = TcpListener::bind("127.0.0.1:0").await?;
-        let addr = listener
-            .local_addr()?
-            .to_socket_addrs()?
-            .next()
-            .expect("No socket available");
+        let addr = listener.local_addr()?;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/ssh-console-mock-api-server/src/lib.rs` around lines 103 - 107, The
address extraction in the listener initialization block performs a redundant
conversion chain. The listener.local_addr() method already returns a SocketAddr,
so the subsequent .to_socket_addrs()?.next().expect(...) calls are unnecessary
and obscure intent. Simplify this by removing the redundant
.to_socket_addrs()?.next().expect("No socket available") portion and directly
use listener.local_addr()? to assign the SocketAddr to the addr variable.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/ssh-console-mock-api-server/src/lib.rs`:
- Around line 103-107: The address extraction in the listener initialization
block performs a redundant conversion chain. The listener.local_addr() method
already returns a SocketAddr, so the subsequent
.to_socket_addrs()?.next().expect(...) calls are unnecessary and obscure intent.
Simplify this by removing the redundant .to_socket_addrs()?.next().expect("No
socket available") portion and directly use listener.local_addr()? to assign the
SocketAddr to the addr variable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: df72820f-d4f0-46c1-aa75-97dd64bdb75d

📥 Commits

Reviewing files that changed from the base of the PR and between 550d15e and c1e3752.

📒 Files selected for processing (1)
  • crates/ssh-console-mock-api-server/src/lib.rs

@rwthompsonii rwthompsonii merged commit 666c292 into NVIDIA:main Jun 16, 2026
54 checks passed
@adnandnv adnandnv deleted the fix/1150-ssh-console-reconnect-test-hang branch June 16, 2026 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants