-
Notifications
You must be signed in to change notification settings - Fork 32
autotuning: window self-blocks at ~2x initial size due to fixed 4*RTT threshold #136
Description
Summary
The receive window autotuning introduced in #54 stops growing the window after ~2x the initial size when there is real network latency. The `4*RTT` threshold in `sendWindowUpdate()` doesn't scale with window size, causing the autotuning condition to fail as the window grows.
Reproduction
With `InitialStreamWindowSize=256KB`, `MaxStreamWindowSize=4MB`, and 50ms RTT:
```
Window grows: 256KB → 512KB (pass: elapsed=312ms < 4RTT=327ms)
Then blocks: 512KB stuck (skip: elapsed=571ms >= 4RTT=370ms)
512KB stuck (skip: elapsed=569ms >= 4RTT=397ms)
512KB stuck (skip: elapsed=569ms >= 4RTT=401ms)
... repeats forever
```
Expected: window should grow to `MaxStreamWindowSize` (4MB) under sustained throughput.
Root Cause
In `stream.go:sendWindowUpdate()`:
```go
if rtt := s.session.getRTT(); flags == 0 && rtt > 0 && now.Sub(s.epochStart) < rtt*4 {
// double the window
}
```
The time between two consecutive `sendWindowUpdate()` calls includes:
- Sender waits for the window update to arrive (~RTT)
- Sender transmits half the window worth of data
- Receiver reads the data, triggering the next `sendWindowUpdate()`
As the window grows, step 2 takes proportionally longer (more data to send), but the `4RTT` threshold is fixed. Once `elapsed > 4RTT` (which happens around 2x initial window size), autotuning permanently stops.
This is the self-blocking cycle:
- Larger window → more data to transfer for 50% fill → longer elapsed
- Longer elapsed → exceeds `4*RTT` → window doesn't grow
- Window doesn't grow → throughput stays limited
Impact
| RTT | Stuck Window | Max Throughput | Expected (4MB) |
|---|---|---|---|
| 50ms | ~512KB | ~10 Mbps | 640 Mbps |
| 100ms | ~512KB | ~5 Mbps | 320 Mbps |
| 200ms | ~256KB | ~1.25 Mbps | 160 Mbps |
This affects any application using go-yamux with `InitialStreamWindowSize < MaxStreamWindowSize` and non-trivial network latency.
Proposed Fix
Scale the threshold proportionally to the window size:
```go
scaleFactor := time.Duration(s.recvWindow / s.session.config.InitialStreamWindowSize)
if scaleFactor < 1 {
scaleFactor = 1
}
threshold := rtt * 4 * scaleFactor
if elapsed < threshold {
// double the window
}
```
This maintains the original behavior at the initial window size (`scaleFactor=1`) while allowing larger windows to have proportionally more time to fill. Verified locally:
```
With fix (50ms RTT, 256KB initial, 4MB max):
256KB → 512KB (pass: elapsed=312ms < threshold=327ms, scale=1)
512KB → 1024KB (pass: elapsed=571ms < threshold=765ms, scale=2)
1024KB → 2048KB (pass: elapsed=1091ms < threshold=1484ms, scale=4)
2048KB → 4096KB (pass: elapsed=2233ms < threshold=3149ms, scale=8)
→ Window reaches MaxStreamWindowSize ✓
```
Happy to submit a PR with the fix and tests if this approach looks right.
Context
Found while investigating VPN throughput issues. Autotuning was added in #54 with the note "the code is untestable" — we wrote tests that reproduce the problem with simulated latency.