Skip to content

wsl.exe relay loops in src/windows/common/relay.cpp miss bytesRead == 0 EOF check on synchronous-success path; cause wsl.exe to spin at 100% on AFD sockets #40651

@baskamic

Description

@baskamic

Windows Version

Microsoft Windows [Version 10.0.26100.8390]

WSL Version

2.7.3.0

Are you using WSL 1 or WSL 2?

  • WSL 2
  • WSL 1

Kernel Version

6.6.114.1-1

Distro Version

Ubuntu 24.04

Other Software

Powershell 7.6.3

Repro Steps

Start powershell
start wsl
keep working on that window multiple hours

Expected Behavior

CPU usage correlated with cpu usage inside WSL

Actual Behavior

CPU usage in WSL ~0%, Windows wsl exe cpu usage ~10-15% (2 threads at 100%)

Diagnostic Logs

long-lived wsl.exe relay process (the one bridging stdio / interop between
a Windows console and the WSL VM) eventually enters a state where two of its
worker threads spin in a tight ReadFile loop on \Device\Afd socket
handles. From that point on the process consumes ~70% CPU per spinning thread
(~140% combined, ~6–10% of total system CPU on a 16-thread box) indefinitely,
even when WSL itself is idle.

Source review of src/windows/common/relay.cpp at release tag 2.7.3 shows
that four distinct ReadFile call sites all share the same defect: on
the synchronous-success branch (i.e. ReadFile returns TRUE instead of
failing with ERROR_IO_PENDING), none of them check lpNumberOfBytesRead
for zero. Per Microsoft's own Winsock documentation, a successful zero-byte
read on a stream socket is the canonical peer-FIN / EOF signal — it
does not surface as an error code. The result is that once the WSL VM side
of an hvsock connection closes, the affected relay loop re-issues the same
read against the EOF'd handle forever.

Environment

Component Version
Windows 11 Enterprise, 10.0.26100.8390
WSL 2.7.3.0 (also observed on 2.6.2.0)
WSLg 1.0.66
WSL kernel 6.6.114.1-1 (Microsoft)
Distro Ubuntu (default WSL2 install)

Symptom

  • One specific wsl.exe process (the per-console relay, not wslservice or
    vmmemWSL) shows constant 6–10% CPU on the Windows host.
  • Inside the WSL VM, all processes are idle (top shows ~0% CPU).
  • The process is not stuck — interactive use through it still works — it is
    simply burning CPU in the background continuously.
  • Behavior persists for the entire lifetime of the affected wsl.exe (observed
    in one case for over 4 hours of accumulated CPU time across two threads).
  • The condition does not clear on its own and only ends when the affected
    wsl.exe is killed (which terminates the corresponding Windows console
    session).

Root cause (from live process inspection + source review)

What we observe

Both spinning threads share these characteristics, sampled repeatedly seconds
apart:

  • Win32 thread start address at the same offset inside wsl.exe (the entry
    point shared by both Interop and IO Relay worker threads).
  • Instruction pointer pinned at ntdll!ZwReadFile+0x14 — the syscall
    instruction inside the system service stub.
  • The handle being read is of type \Device\Afd (Winsock kernel object —
    the user-mode endpoint of a Hyper-V socket bridging to the WSL VM).
  • Disassembly around the call site shows a ReadFile call (via IAT thunk)
    whose BOOL return value is checked, but whose lpNumberOfBytesRead is
    not, on the synchronous-success path. The error-path branches do test
    ERROR_BROKEN_PIPE (109) and ERROR_HANDLE_EOF (38), but the
    success-with-zero-bytes case appears to flow back into the read loop /
    wait state without an EOF break.
# Function Sync-TRUE EOF handling
1 wsl::windows::common::relay::InterruptableRead Returns bytesRead to caller; depends on the caller to interpret 0 as EOF.
2 BidirectionalRelay — left side Sets leftReadPending = true and relies on the later GetOverlappedResult branch (which does check leftBytesRead == 0) to detect EOF.
3 BidirectionalRelay — right side Same pattern as the left side.
4 ScopedMultiRelay::Run Calls Write(gsl::make_span(data, BytesRead)) (a no-op when BytesRead == 0), sets state to Standby, and the outer loop re-issues ReadFile on the same handle. EOF is only recognized via ERROR_HANDLE_EOF / ERROR_BROKEN_PIPE on the failure branch.

The synchronous-success-with-zero-bytes case is exactly what a Winsock
graceful FIN produces on an AFD socket handle (cited above). Under that
condition:

  • For ScopedMultiRelay::Run the loop immediately re-enters ReadFile on
    the EOF'd handle, which returns TRUE + zero bytes again, producing a
    pure CPU-bound spin.
  • For BidirectionalRelay the path is more subtle: the sync-success branch
    goes to WaitForMultipleObjects. If the overlapped event is set
    appropriately on sync completion, GetOverlappedResult will pick up the
    zero-byte read and tear the handle down. If the event is not set on
    sync completion, or if there is any state where the wait keeps returning
    immediately without GetOverlappedResult advancing, the same EOF is
    consumed repeatedly and the result is a spin.
  • For InterruptableRead the spin only manifests if its caller fails to
    treat a returned bytesRead == 0 as EOF (some callers in the codebase
    do; if any do not, that path spins).

The two-threads-per-process spinning pattern we observe is consistent with
any of these — most directly with BidirectionalRelay (one per direction)
or ScopedMultiRelay::Run running two relays in parallel. We cannot pin
the spin to one specific function without the wsl.exe

Workaround

Open new powershell and run wsl, then close "original" powershell with cpu problem.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions