Congestion Control Overhaul by JPDye · Pull Request #1149 · smoltcp-rs/smoltcp

JPDye · 2026-05-05T16:58:25Z

Edited. Again.

TL;DR: refactors Controller trait to distinguish RTO / 3-dup-ACK / per-dup-ACK events, fixes some Reno bugs and fixes some CUBIC bugs, with more fixes and tests to come.

This has turned into a much bigger thing than I thought.

In short, the congestion implementations have many bugs and I'm now trying to fix them. This comes in three parts. Changes to the Controller, changes to Reno and changes to Cubic.

The Controller

The controller doesn't understand congestion events and this is a source of multiple bugs.

Controller::retransmit is used to notify of both an RTO and the fast retransmit timer, making it hard for the congestion control implementations to decide between entering slow start or entering fast recovery.

Controller::on_duplicate_ack is seemingly treated as notification of a single duplicated ACK (as the name suggests) in socket/tcp.rs but as a notification of congestion in the congestion control implementations.

In CUBIC, after 3 consecutive duplicate ACKs this means w_max could end up more than half what it should be (0.3 * w_max vs 0.7 * w_max). In Reno, after 3 consecutive duplicate ACKs this means ssthresh could end up three times smaller than it should be (cwnd / 6 vs cwnd / 2).

My fix here has been to distinguish between congestion events (RTO and repeated duplicate ACKs).

I've added Controller::on_rto and Controller::on_loss. These can be used by the congestion algorithms to decide between entering slow start and fast recovery.

I've also introduced the bytes_in_flight parameter to these methods to give the congestion controllers more information (NewReno would like it for example) and added len to Controller::on_dup_ack for when SACK and D-SACK come about (within a month given the time I've been allocated to all this).

Reno

Beyond the bugs that came from the inability to distinguish between loss events and whatnot, there were a number of bugs in the Reno implementation. Here's some:

Exiting fast recovery (receiving a non-duplicated ACK) should deflate the cwnd back to the ssthresh (as the cwnd is artificially inflated from all the duplicate ACKs we advanced it by). This implementation however had it the wrong way round and was setting ssthresh equal to the cwnd. This would have increased the chance of running into more packetloss.
Doing fast recovery involves incrementing the cwnd for each duplicate ACK received. This implementation wasn't doing anything with duplicated ACKs.
Not setting cwnd to the correct value after an RTO and entering slow start.

CUBIC

Beyond the bugs that came from the inability to distinguish between events, CUBIC had other bugs too. Here's some:

Entering fast recovery on startup without any packet loss occurring, significantly reducing cwnd and growth rate for no reason.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.51%. Comparing base (ffeaf62) to head (c54a7f3).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1149      +/-   ##
==========================================
+ Coverage   81.48%   81.51%   +0.02%     
==========================================
  Files          81       81              
  Lines       25007    25040      +33     
==========================================
+ Hits        20378    20412      +34     
+ Misses       4629     4628       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JPDye · 2026-05-07T11:13:16Z

Have overhauled Reno (RFC 5681) implementation and now believe it works exactly as intended. Next push will be a bunch of tests for the flow from startup to steady state congestion avoidance.

slow start -> rto -> slow start -> congestion avoidance
slow start -> trip dup ack -> fast recovery -> congestion avoidance

- operates in terms of congestion events - takes unACKed byte count - fast-retransmit operates correctly

…FC compliant: - enter slow start (and exit fast recovery) on RTO - prevent multiple `on_loss()` calls triggering window reductions - deflate `cwnd` when leaving fast recovery - cap slow start `cwnd` increment to 1MSS per ACK - use new `in_flight` to calculate `ssthresh`

JPDye · 2026-05-21T17:27:06Z

Reno tests added. Pretty happy with the Reno implementation.

However, there's now a big performance degradation in the netsim test. This is due to some changes on fast-retransmit handling. Previously, on loss, all data would be resent rather than just the first segment (as per the RFC).

Ontop of this netsim degredation for no CC, when using Reno in the netsim the results are even worse. So... I've created a multi-flow netsim that better shows the benefits of congestion control.

These are the initial Reno results as a percentage change from the new no-CC baseline.

╭───┬───────┬──────────┬──────────┬──────────┬──────────┬────────╮
│ # │ flows │ agg_thru │ min_thru │ max_thru │ fairness │ drops  │
├───┼───────┼──────────┼──────────┼──────────┼──────────┼────────┤
│ 0 │     1 │ -6.6%    │ -6.6%    │ -6.6%    │ 0%       │ 0%     │
│ 1 │     2 │ -6.2%    │ -6.4%    │ -6%      │ 0%       │ 0%     │
│ 2 │     4 │ -3.4%    │ -5.2%    │ -2.2%    │ -0.4%    │ 0%     │
│ 3 │    16 │ +13.2%   │ +70.3%   │ -0.2%    │ +4.5%    │ -81.3% │
│ 4 │    32 │ +16.1%   │ +93%     │ -15.2%   │ +8.1%    │ -78.7% │
│ 5 │    64 │ +16.8%   │ +149.3%  │ -19.6%   │ +8.1%    │ -67.2% │
╰───┴───────┴──────────┴──────────┴──────────┴──────────┴────────╯

The new test simulates "realistic" router packet loss (rather than straight randomization) and has per flow RTTs and traffic. We see better fairness, throughput and less overwhelming of the router when multiple parallel flows enter the picture.

Next commit (and then probably immediately as a fresh PR) will be the multi-flow netsim stuff.

Dirbaio · 2026-06-14T22:01:11Z

this can be closed right? the other PRs contain all the changes here

JPDye changed the title ~~test for dup-ack cwnd reduction (+ discovered early recovery bug)~~ Fix CUBIC congestion window bugs May 5, 2026

JPDye force-pushed the cubic-reduction-fix branch 5 times, most recently from 1031a6b to 7021ab6 Compare May 7, 2026 11:12

JPDye changed the title ~~Fix CUBIC congestion window bugs~~ Congestion Control Overhaul May 7, 2026

JPDye force-pushed the cubic-reduction-fix branch from 7021ab6 to 5937fe3 Compare May 20, 2026 14:40

Dirbaio and others added 4 commits May 20, 2026 15:46

Release v0.13.1

96fe770

congestion Controller overhaul:

93d64a8

- operates in terms of congestion events - takes unACKed byte count - fast-retransmit operates correctly

reno tests for slow start and fast recovery

c54a7f3

JPDye force-pushed the cubic-reduction-fix branch from 5937fe3 to c54a7f3 Compare May 20, 2026 14:46

JPDye mentioned this pull request May 26, 2026

multiflow netsim test #1153

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Congestion Control Overhaul#1149

Congestion Control Overhaul#1149
JPDye wants to merge 4 commits into
smoltcp-rs:mainfrom
JPDye:cubic-reduction-fix

JPDye commented May 5, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 5, 2026 •

edited

Loading

Uh oh!

JPDye commented May 7, 2026

Uh oh!

JPDye commented May 21, 2026

Uh oh!

Dirbaio commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

JPDye commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Controller

Reno

CUBIC

Next

Uh oh!

codecov Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JPDye commented May 7, 2026

Uh oh!

JPDye commented May 21, 2026

Uh oh!

Dirbaio commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

JPDye commented May 5, 2026 •

edited

Loading

codecov Bot commented May 5, 2026 •

edited

Loading