Skip to content

autotuning: window self-blocks at ~2x initial size due to fixed 4*RTT threshold #136

@KaroUniform

Description

@KaroUniform

Summary

The receive window autotuning introduced in #54 stops growing the window after ~2x the initial size when there is real network latency. The `4*RTT` threshold in `sendWindowUpdate()` doesn't scale with window size, causing the autotuning condition to fail as the window grows.

Reproduction

With `InitialStreamWindowSize=256KB`, `MaxStreamWindowSize=4MB`, and 50ms RTT:

```
Window grows: 256KB → 512KB (pass: elapsed=312ms < 4RTT=327ms)
Then blocks: 512KB stuck (skip: elapsed=571ms >= 4
RTT=370ms)
512KB stuck (skip: elapsed=569ms >= 4RTT=397ms)
512KB stuck (skip: elapsed=569ms >= 4
RTT=401ms)
... repeats forever
```

Expected: window should grow to `MaxStreamWindowSize` (4MB) under sustained throughput.

Root Cause

In `stream.go:sendWindowUpdate()`:
```go
if rtt := s.session.getRTT(); flags == 0 && rtt > 0 && now.Sub(s.epochStart) < rtt*4 {
// double the window
}
```

The time between two consecutive `sendWindowUpdate()` calls includes:

  1. Sender waits for the window update to arrive (~RTT)
  2. Sender transmits half the window worth of data
  3. Receiver reads the data, triggering the next `sendWindowUpdate()`

As the window grows, step 2 takes proportionally longer (more data to send), but the `4RTT` threshold is fixed. Once `elapsed > 4RTT` (which happens around 2x initial window size), autotuning permanently stops.

This is the self-blocking cycle:

  • Larger window → more data to transfer for 50% fill → longer elapsed
  • Longer elapsed → exceeds `4*RTT` → window doesn't grow
  • Window doesn't grow → throughput stays limited

Impact

RTT Stuck Window Max Throughput Expected (4MB)
50ms ~512KB ~10 Mbps 640 Mbps
100ms ~512KB ~5 Mbps 320 Mbps
200ms ~256KB ~1.25 Mbps 160 Mbps

This affects any application using go-yamux with `InitialStreamWindowSize < MaxStreamWindowSize` and non-trivial network latency.

Proposed Fix

Scale the threshold proportionally to the window size:

```go
scaleFactor := time.Duration(s.recvWindow / s.session.config.InitialStreamWindowSize)
if scaleFactor < 1 {
scaleFactor = 1
}
threshold := rtt * 4 * scaleFactor

if elapsed < threshold {
// double the window
}
```

This maintains the original behavior at the initial window size (`scaleFactor=1`) while allowing larger windows to have proportionally more time to fill. Verified locally:

```
With fix (50ms RTT, 256KB initial, 4MB max):
256KB → 512KB (pass: elapsed=312ms < threshold=327ms, scale=1)
512KB → 1024KB (pass: elapsed=571ms < threshold=765ms, scale=2)
1024KB → 2048KB (pass: elapsed=1091ms < threshold=1484ms, scale=4)
2048KB → 4096KB (pass: elapsed=2233ms < threshold=3149ms, scale=8)
→ Window reaches MaxStreamWindowSize ✓
```

Happy to submit a PR with the fix and tests if this approach looks right.

Context

Found while investigating VPN throughput issues. Autotuning was added in #54 with the note "the code is untestable" — we wrote tests that reproduce the problem with simulated latency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions