-
Notifications
You must be signed in to change notification settings - Fork 228
Description
Observed behavior
Hi, is this a bug or intentional? I am no rust expert, but found this observation while doing some debugging with help of ai.
Summary
ConnectOptions::connection_timeout() wraps only TcpStream::connect() (the TCP SYN/SYN-ACK), but the subsequent NATS protocol handshake — reading INFO, writing CONNECT+PING, reading PONG — has no timeout at all. If a host is reachable at the TCP level but the NATS process is frozen or unresponsive, connect() hangs indefinitely.
Where the gap is
In connector.rs, the timeout is applied around TcpStream::connect():
tokio::time::timeout(self.connect_timeout, TcpStream::connect(addr)).await??;
Once TCP connects successfully, the handshake calls (read_op(), easy_write_and_flush(), etc.) are completely unguarded. The ping/keepalive mechanisms only activate after the connection handler spawns, which happens after the handshake completes. The WebSocket paths have the same issue — timeout only wraps the WS/WSS connect, not the subsequent protocol exchange.
How Go's NATS client handles this
The Go client in nats.go solves this cleanly in processConnectInit() by setting a single deadline that covers the entire handshake:
func (nc *Conn) processConnectInit() error {
nc.conn.SetDeadline(time.Now().Add(nc.Opts.Timeout))
defer nc.conn.SetDeadline(time.Time{})
// ... reads INFO, does TLS upgrade, writes CONNECT+PING, reads PONG
// All covered by the deadline above
}This means Go's Timeout option covers:
✅ TCP connect
✅ Reading INFO
✅ TLS upgrade
✅ Writing CONNECT+PING
✅ Reading PONG
Whereas async-nats currently covers:
✅ TCP connect
❌ Reading INFO
❌ TLS upgrade
❌ Writing CONNECT+PING
❌ Reading PONG
Suggested fix
One approach (mirroring Go) would be to wrap the entire handshake sequence in a single tokio::time::timeout() using the existing connect_timeout value, rather than only the TCP connect. Something like:
tokio::time::timeout(self.connect_timeout, async {
let stream = TcpStream::connect(addr).await?;
// ... read INFO, TLS, write CONNECT+PING, read PONG
Ok(stream)
}).await??;Workaround
For now, callers can wrap the entire connect() call in their own tokio::time::timeout(), but it would be much better for connection_timeout to cover what users naturally expect it to cover.
Happy to help with a PR if the approach above sounds reasonable!
Expected behavior
Expect timeout for the entire handshake sequence as in the Go lib.
Server and client version
async_nats 0.46.0
Host environment
No response
Steps to reproduce
Reproduction
A minimal reproducer: create a bare TCP listener that accepts connections but never sends any NATS protocol data. async_nats::ConnectOptions::connect() with a connection_timeout will hang past that timeout because TCP connect succeeds, but the INFO read never completes.
// Bare TCP listener — accepts but never speaks NATS protocol
let listener = tokio::net::TcpListener::bind("127.0.0.1:0").await?;
let addr = listener.local_addr()?;
tokio::spawn(async move {
let (_socket, _) = listener.accept().await.unwrap();
// Hold the connection open, send nothing
tokio::time::sleep(Duration::from_secs(300)).await;
});
let connect_timeout = Duration::from_secs(2);
let opts = async_nats::ConnectOptions::new()
.connection_timeout(connect_timeout);
// This hangs well past 2 seconds — the timeout doesn't cover the handshake
let result = tokio::time::timeout(
connect_timeout + Duration::from_secs(3),
opts.connect(format!("nats://127.0.0.1:{}", addr.port())),
).await;
// The outer tokio::time::timeout fires, not connection_timeout
assert!(result.is_err(), "connect() hung past connection_timeout");