Skip to content

ATTACH to SQL Server can last for 30s on Windows with multiple network adapters #122

@lucap-irion

Description

@lucap-irion

ATTACH to SQL Server can last for 30s on Windows with multiple network adapters

Summary

On Windows, TdsSocket::Connect() gives the full connection timeout (default 30s) to each address returned by getaddrinfo() sequentially. When a hostname resolves to multiple addresses (common on multi-NIC Windows machines), and the first address(es) don't respond (SYN silently dropped — no RST), the connection hangs for the full timeout before trying the next address.

Environment

  • Windows 11/10 with multiple network adapters (Ethernet, Docker vEthernet, WSL, VPN)
  • SQL Server listening on one specific adapter
  • Hostname resolves via DNS to multiple IPv4 + IPv6 link-local addresses

Steps to Reproduce

  1. Have a Windows machine with multiple NICs (Docker Desktop, WSL2, or VPN installed)
  2. SQL Server running locally, listening on one adapter (e.g., 192.168.1.x)
  3. ATTACH 'Server=HOSTNAME,...' AS db (TYPE mssql) — uses hostname, not IP

Observed Behavior

getaddrinfo(hostname, port, AF_UNSPEC) returns 4-9 addresses (mix of IPv6 link-local and multiple IPv4 from different adapters). The current code in TdsSocket::Connect() iterates sequentially, giving each address the full timeout_seconds * 1000 ms via WaitForReady().

If the first address is unreachable (TCP SYN silently dropped — common for link-local IPv6 or addresses on inactive adapters), the connection takes 30s × N before it reaches the working address.

Additionally, the issue is intermittent — the same address can succeed on one attempt and timeout on the next (likely due to TCP TIME_WAIT state or Windows Firewall throttling rapid reconnects).

Expected Behavior

Connection should succeed within 1-2 seconds regardless of how many addresses DNS returns, matching the behavior of Microsoft.Data.SqlClient.

Root Cause

In src/tds/tds_socket.cpp, Connect():

for (p = res; p != nullptr; p = p->ai_next) {
    // ... socket + connect ...
    WaitForReady(timeout_seconds * 1000, true);  // full timeout per address!
}

Proposed Fix: Happy Eyeballs (RFC 8305)

Microsoft's own SqlClient (.NET) uses a staggered parallel connect approach since .NET 8:

  1. Start non-blocking connect() to first address
  2. After 250ms, if not yet connected, start connecting to next address in parallel
  3. Poll all active sockets simultaneously
  4. First socket to connect wins — close all others
  5. Overall capped by connection_timeout

This ensures:

  • Fast connection when first address works (<250ms, no penalty)
  • Quick fallthrough (250ms stagger) when addresses are unreachable
  • All addresses get a fair chance without sequential blocking
  • Total timeout still respected as upper bound

Could the same approach be applied here as well?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions