chore: Avoid Box<dyn CongestionControl>#3451
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3451 +/- ##
==========================================
- Coverage 95.03% 94.88% -0.15%
==========================================
Files 109 115 +6
Lines 37633 38049 +416
Branches 37633 38049 +416
==========================================
+ Hits 35764 36103 +339
- Misses 1178 1243 +65
- Partials 691 703 +12
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Merging this PR will not alter performance
Comparing Footnotes
|
There was a problem hiding this comment.
Pull request overview
Refactors neqo-transport congestion control dispatch to avoid Box<dyn CongestionControl> in the packet sending hot path by using a concrete enum-based dispatcher.
Changes:
- Replace
PacketSender’sBox<dyn CongestionControl>with a concreteCongestionControllerenum and update construction accordingly. - Introduce
enum_dispatch-based forwarding for theCongestionControltrait and add aCongestionControllerenum forNewReno/Cubic. - Update congestion-control tests to work with the new enum dispatch approach (including relocating
cwnd_initial()test-only access).
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| neqo-transport/src/sender.rs | Switch sender congestion control from boxed trait object to CongestionController enum. |
| neqo-transport/src/cc/mod.rs | Add enum_dispatch usage and define CongestionController enum implementing CongestionControl. |
| neqo-transport/src/cc/classic_cc.rs | Adjust test helpers/coverage for enum dispatch; move cwnd_initial() to a test-only inherent method. |
| neqo-transport/Cargo.toml | Add enum_dispatch as a crate dependency. |
| Cargo.toml | Add enum_dispatch to workspace dependencies. |
| Cargo.lock | Lockfile update for the new dependency. |
You can also share your feedback on Copilot code review. Take the survey.
|
@mxinden worth it? |
6279114 to
d04e438
Compare
|
I assume the goal is an improvement in performance? Unless we have some indicator that this patch actually does improve performance, I don't think this patch is worth the added complexity. |
|
Well, the transfer benches seem a bit faster. Let's see if the run after the rebase also shows +3%. |
Client/server transfer resultsPerformance differences relative to 2fbd329. Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Table above only shows statistically significant changes. See all results below. All resultsTransfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Download data for |
martinthomson
left a comment
There was a problem hiding this comment.
@larseggert do you want to push on this? I've taken a look at enum_dispatch and it's pretty complicated, but I see no warning flags. As it is only run at compile time, I think we're pretty safe, provided that someone spends a little more time than I did checking that the crate isn't poisoned somehow.
|
@martinthomson should we just define our own dispatch macro and not depend on |
martinthomson
left a comment
There was a problem hiding this comment.
Yeah, this is pretty reasonable. A simpler proc macro than enum_dispatch would have made this easier, but we don't need to go to those lengths for just one of these.
For discussion - is this worth it? (The extra dep could be avoided if we implement a dispatch macro ourselves.)
1ad8dc3 to
79835c1
Compare
There was a problem hiding this comment.
Clean approach. Replacing Box<dyn CongestionController> with a concrete enum + hand-rolled dispatch macro eliminates heap allocation and vtable indirection on the per-packet hot path, with no new dependencies (the earlier enum_dispatch crate has been removed in favor of the neqo_common::dispatch! macro, addressing the earlier review concern).
The trade-off is ~80 lines of manual trait forwarding that must be kept in sync with CongestionController. This is compiler-enforced (missing methods won't compile), so it's safe, just verbose. For a single enum this is reasonable — a proc macro would only pay off if more enums need the same treatment.
The strum::Display derive with #[strum(to_string = "{0}")] delegates formatting to the inner ClassicCongestionController, which already implements Display — consistent with the existing strum usage in this module.
Overall LGTM.
There was a problem hiding this comment.
Clean refactoring. Replacing Box<dyn CongestionController> with a concrete enum eliminates heap allocation and vtable indirection on the per-packet hot path, and the hand-rolled neqo_common::dispatch! macro keeps the forwarding boilerplate manageable without adding a new dependency.
Moving cwnd_initial from the trait to a #[cfg(test)] inherent method on ClassicCongestionController is the right call — it avoids polluting the dispatch enum with test-only plumbing while keeping the tests unchanged (they already construct the concrete type directly).
One minor optional suggestion inline on the strum::Display derive.
Benchmark resultsNo significant performance differences relative to 143849d. All resultstransfer/1-conn/1-100mb-resp (aka. Download): No change in performance detected. time: [189.14 ms 189.56 ms 190.04 ms]
thrpt: [526.21 MiB/s 527.54 MiB/s 528.71 MiB/s]
change:
time: [-0.5748% -0.2850% +0.0299] (p = 0.06 > 0.05)
thrpt: [-0.0299% +0.2858% +0.5781]
No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severetransfer/1-conn/10_000-parallel-1b-resp (aka. RPS): No change in performance detected. time: [286.93 ms 288.95 ms 291.00 ms]
thrpt: [34.365 Kelem/s 34.608 Kelem/s 34.851 Kelem/s]
change:
time: [-1.1392% -0.1968% +0.7458] (p = 0.69 > 0.05)
thrpt: [-0.7402% +0.1972% +1.1523]
No change in performance detected.transfer/1-conn/1-1b-resp (aka. HPS): No change in performance detected. time: [38.688 ms 38.862 ms 39.056 ms]
thrpt: [25.604 B/s 25.732 B/s 25.848 B/s]
change:
time: [-0.5096% +0.1255% +0.8180] (p = 0.70 > 0.05)
thrpt: [-0.8113% -0.1254% +0.5122]
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high severetransfer/1-conn/1-100mb-req (aka. Upload): No change in performance detected. time: [193.67 ms 193.97 ms 194.29 ms]
thrpt: [514.70 MiB/s 515.55 MiB/s 516.33 MiB/s]
change:
time: [-0.4244% +0.0114% +0.3456] (p = 0.96 > 0.05)
thrpt: [-0.3444% -0.0114% +0.4262]
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severestreams/walltime/1-streams/each-1000-bytes: Change within noise threshold. time: [584.43 µs 587.24 µs 590.44 µs]
change: [-1.6781% -0.9874% -0.2193] (p = 0.01 < 0.05)
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
12 (12.00%) high severestreams/walltime/1000-streams/each-1-bytes: Change within noise threshold. time: [12.086 ms 12.118 ms 12.166 ms]
change: [-0.9397% -0.6312% -0.1855] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severestreams/walltime/1000-streams/each-1000-bytes: No change in performance detected. time: [43.509 ms 43.554 ms 43.601 ms]
change: [-0.4112% -0.1374% +0.0755] (p = 0.30 > 0.05)
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mildstreams-flow-controlled/walltime/1-streams/each-4194304-bytes: Change within noise threshold. time: [33.995 ms 34.041 ms 34.089 ms]
change: [-1.3474% -1.1457% -0.9412] (p = 0.00 < 0.05)
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mildstreams-flow-controlled/walltime/10-streams/each-1048576-bytes: No change in performance detected. time: [97.472 ms 98.724 ms 100.01 ms]
change: [-1.2945% +0.4917% +2.3795] (p = 0.60 > 0.05)
No change in performance detected.transfer/walltime/pacing-false/varying-seeds: Change within noise threshold. time: [23.103 ms 23.131 ms 23.174 ms]
change: [+2.6349% +2.9043% +3.1614] (p = 0.00 < 0.05)
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severetransfer/walltime/pacing-true/varying-seeds: Change within noise threshold. time: [23.283 ms 23.312 ms 23.348 ms]
change: [+2.5868% +2.7841% +2.9683] (p = 0.00 < 0.05)
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severetransfer/walltime/pacing-false/same-seed: Change within noise threshold. time: [22.838 ms 22.859 ms 22.882 ms]
change: [+0.2106% +0.3534% +0.4836] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severetransfer/walltime/pacing-true/same-seed: Change within noise threshold. time: [23.198 ms 23.228 ms 23.274 ms]
change: [+0.2832% +0.4904% +0.7255] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severeDownload data for |
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-pr as client
neqo-pr as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-pr as client
neqo-pr as server
|
For discussion - is this worth it?
(The extra dep could be avoided if we implement a dispatch macro ourselves.)