Skip to content

connection_timeout only covers TCP connect, not the full NATS handshake #1526

@kfollesdal

Description

@kfollesdal

Observed behavior

Hi, is this a bug or intentional? I am no rust expert, but found this observation while doing some debugging with help of ai.

Summary

ConnectOptions::connection_timeout() wraps only TcpStream::connect() (the TCP SYN/SYN-ACK), but the subsequent NATS protocol handshake — reading INFO, writing CONNECT+PING, reading PONG — has no timeout at all. If a host is reachable at the TCP level but the NATS process is frozen or unresponsive, connect() hangs indefinitely.

Where the gap is

In connector.rs, the timeout is applied around TcpStream::connect():

tokio::time::timeout(self.connect_timeout, TcpStream::connect(addr)).await??;

Once TCP connects successfully, the handshake calls (read_op(), easy_write_and_flush(), etc.) are completely unguarded. The ping/keepalive mechanisms only activate after the connection handler spawns, which happens after the handshake completes. The WebSocket paths have the same issue — timeout only wraps the WS/WSS connect, not the subsequent protocol exchange.

How Go's NATS client handles this

The Go client in nats.go solves this cleanly in processConnectInit() by setting a single deadline that covers the entire handshake:

func (nc *Conn) processConnectInit() error {
    nc.conn.SetDeadline(time.Now().Add(nc.Opts.Timeout))
    defer nc.conn.SetDeadline(time.Time{})
    // ... reads INFO, does TLS upgrade, writes CONNECT+PING, reads PONG
    // All covered by the deadline above
}

This means Go's Timeout option covers:

✅ TCP connect
✅ Reading INFO
✅ TLS upgrade
✅ Writing CONNECT+PING
✅ Reading PONG

Whereas async-nats currently covers:

✅ TCP connect
❌ Reading INFO
❌ TLS upgrade
❌ Writing CONNECT+PING
❌ Reading PONG

Suggested fix

One approach (mirroring Go) would be to wrap the entire handshake sequence in a single tokio::time::timeout() using the existing connect_timeout value, rather than only the TCP connect. Something like:

tokio::time::timeout(self.connect_timeout, async {
    let stream = TcpStream::connect(addr).await?;
    // ... read INFO, TLS, write CONNECT+PING, read PONG
    Ok(stream)
}).await??;

Workaround

For now, callers can wrap the entire connect() call in their own tokio::time::timeout(), but it would be much better for connection_timeout to cover what users naturally expect it to cover.

Happy to help with a PR if the approach above sounds reasonable!

Expected behavior

Expect timeout for the entire handshake sequence as in the Go lib.

Server and client version

async_nats 0.46.0

Host environment

No response

Steps to reproduce

Reproduction

A minimal reproducer: create a bare TCP listener that accepts connections but never sends any NATS protocol data. async_nats::ConnectOptions::connect() with a connection_timeout will hang past that timeout because TCP connect succeeds, but the INFO read never completes.

// Bare TCP listener — accepts but never speaks NATS protocol
let listener = tokio::net::TcpListener::bind("127.0.0.1:0").await?;
let addr = listener.local_addr()?;

tokio::spawn(async move {
    let (_socket, _) = listener.accept().await.unwrap();
    // Hold the connection open, send nothing
    tokio::time::sleep(Duration::from_secs(300)).await;
});

let connect_timeout = Duration::from_secs(2);
let opts = async_nats::ConnectOptions::new()
    .connection_timeout(connect_timeout);

// This hangs well past 2 seconds — the timeout doesn't cover the handshake
let result = tokio::time::timeout(
    connect_timeout + Duration::from_secs(3),
    opts.connect(format!("nats://127.0.0.1:{}", addr.port())),
).await;

// The outer tokio::time::timeout fires, not connection_timeout
assert!(result.is_err(), "connect() hung past connection_timeout");

Metadata

Metadata

Assignees

No one assigned

    Labels

    defectSuspected defect such as a bug or regression

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions