feat(cc): SEARCH#3518
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3518 +/- ##
==========================================
- Coverage 95.03% 94.96% -0.07%
==========================================
Files 110 116 +6
Lines 37704 38276 +572
Branches 37704 38276 +572
==========================================
+ Hits 35831 36349 +518
- Misses 1180 1222 +42
- Partials 693 705 +12
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Merging this PR will not alter performance
Comparing Footnotes
|
There was a problem hiding this comment.
Review: SEARCH slow-start implementation (draft-chung-ccwg-search-09)
Clean, well-structured implementation. The algorithm logic — bin-based delivery-rate tracking, interpolated lookback, and normalized-difference exit criterion — reads correctly against the draft. The SlowStart trait extension is minimal and the integration into ClassicCongestionController is straightforward.
Key observations:
-
Floating-point elimination: The earlier feedback from @larseggert about avoiding
f64has been addressed — all arithmetic now uses integer fixed-point withSCALE = 100. Nice. -
Division-by-zero risk: The
initializeguard only rejectsDuration::ZERORTTs, but the computedbin_durationcan still be zero for very small non-zero RTTs (see inline comment). This creates panic paths inupdate_binsandon_packets_acked. -
Drain-phase omission: The decision to skip the draft §3.2 drain-phase is well-reasoned given CUBIC's buffer-filling behavior. The
TODOfor telemetry capture of the drain-target is a good follow-up. -
Tests: The PR description notes tests will be added separately. The SEARCH algorithm has several interesting edge cases worth covering: initialization on first ACK, bin skipping/reset for stale data, the
prev_idx <= Wearly-return guard,compute_sentinterpolation boundaries, and the exit threshold itself. Looking forward to seeing those. -
Spec divergences are well-documented: The
NOTEcomments for divergences from draft-09 (reset mechanism, bit-shifting optimization, interpolation direction fix, drain-phase) are clear and cite the relevant draft sections — this is exactly the right approach for an implementation tracking an evolving draft.
No blocking issues found. The bin_duration == 0 panic path (inline) is the most important item to address.
There was a problem hiding this comment.
Pull request overview
Adds the SEARCH (Slow start Exit At Right CHokepoint) slow-start exit algorithm to neqo-transport congestion control, enabling heuristic slow-start exit based on delivery-rate flattening per the SEARCH draft.
Changes:
- Introduces
cc::Searchand wires it intoPacketSenderfor NewReno and Cubic. - Extends the
SlowStarttrait to accept per-ACK delivered bytes andnow, and per-send packet bytes; updates HyStart/ClassicSlowStart and related tests accordingly. - Updates clippy doc identifier allowlist for “CHokepoint”.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| neqo-transport/src/sender.rs | Wires SlowStart::Search into CC selection for NewReno/Cubic. |
| neqo-transport/src/cc/search.rs | New SEARCH slow-start exit algorithm implementation. |
| neqo-transport/src/cc/mod.rs | Registers and exports Search; adds SlowStart::Search enum variant. |
| neqo-transport/src/cc/classic_cc.rs | Extends SlowStart trait API; plumbs new args through CC core. |
| neqo-transport/src/cc/hystart.rs | Updates HyStart to new SlowStart trait signature. |
| neqo-transport/src/cc/classic_slow_start.rs | Updates ClassicSlowStart to new SlowStart trait signature. |
| neqo-transport/src/cc/tests/hystart.rs | Updates HyStart test suite to new SlowStart trait signature. |
| .clippy.toml | Allows “CHokepoint” in rustdoc identifiers. |
There was a problem hiding this comment.
Clean implementation of SEARCH draft-09. The algorithm logic — bin-based delivery-rate tracking, interpolated lookback via compute_sent, and normalized-difference exit criterion — reads correctly against the spec. Constants (W=10, THRESH=0.26, WINDOW_SIZE_FACTOR=3.50) all match.
The intentional divergences are well-documented with rationale and spec citations:
- Stale-data reset (§ missed bins): good addition for app-limited and flow-control-limited scenarios.
- Drain-phase omission (§3.2): well-reasoned given CUBIC's buffer-filling behavior; the TODO for telemetry capture is a good follow-up.
- Interpolation direction fix: correcting the draft's forward interpolation to backward — noted as discussed with the SEARCH team.
The SlowStart trait extension is minimal. Making on_packet_sent a default no-op is a good design choice — it keeps ClassicSlowStart simple while letting SEARCH and HyStart override only what they need. The new new_acked_bytes and now parameters are well-motivated by SEARCH's delivery-rate tracking needs.
Two inline findings (neither blocking): a clippy::cast_possible_truncation lint issue with as usize casts, and a minor 32-bit arithmetic overflow concern in compute_sent.
The previous bot-review items (typos, test parameter confusion) appear to have been addressed in the latest commit.
There was a problem hiding this comment.
Good implementation of SEARCH draft-09. The integer-math refactor from floating point is clean, the circular buffer indexing is correct (the prev_idx <= W guard ensures compute_sent never underflows on old - 1), and the spec divergences (no drain phase, optional missed-bin reset) are well-documented with rationale.
Architecture: The SlowStart trait extension (new_acked_bytes, now) is a reasonable approach — the new parameters are plumbed through cleanly and existing impls (ClassicSlowStart, HyStart) just ignore them. The on_packet_sent default body {} is a nice simplification vs. requiring every impl to provide a no-op.
Key items from this review:
- 32-bit overflow in
norm_diffcomputation (same class as the interpolation issue already discussed, but different location) u128 as usizecasts should useusize::try_fromfor consistency withupdate_binsdebug_assert!(false)gives no visibility in release builds — consider adding aqwarn!
Tests are still pending per the PR description — looking forward to seeing those cover the exit condition, missed-bin reset, and edge cases (zero/tiny RTT, bins_last_rtt = 0).
There was a problem hiding this comment.
Summary
Clean implementation of SEARCH 4.0 (draft-09) that integrates well via the existing SlowStart trait. The trait extension is nicely done — adding a default no-op for on_packet_sent avoids boilerplate in implementors that don't need it. Divergences from the spec (no drain phase, interpolation direction fix, optional reset on missed bins) are well-documented with rationale and spec links.
The circular buffer arithmetic is correct — I verified that NUM_ACKED_BINS (11) and NUM_SENT_BINS (26) are exactly sized to avoid aliasing given the access patterns in compute_delv and compute_sent, including the EXTRA_BINS guard at line 266.
The new tests cover initialization, bin advancement, skipped-bin propagation, reset/reinit, and the end-to-end exit condition. A few targeted edge-case tests would round out coverage (noted inline).
Most arithmetic concerns (32-bit overflow, bare as usize casts) were raised in prior review rounds — I've added concrete suggestions for the remaining instances. The debug_assert! + #[should_panic] interaction is a new finding that should be addressed before merge.
|
@omansfeld there are a lot of QNS timeouts that other recent PRs such as #3586 (comment) do not have. That is suspicious. |
There was a problem hiding this comment.
Summary
The SEARCH implementation is solid. The algorithm is faithfully mapped from draft-chung-ccwg-search-09, divergences are clearly documented with reasoning (drain-phase skip, reset on stale data, backward interpolation fix), and the integer arithmetic is careful — u64 widening for 32-bit safety, saturating_* ops, try_from with proper fallbacks. The SlowStart trait extension is clean: new parameters have defaults, existing implementors adapt mechanically.
Test coverage for the core algorithm is good: initialization, bin advancement, skipped-bin propagation, reset, warming-up boundary, RTT inflation guard, steady delivery rate, and exit detection are all exercised. A few bot-suggested additions remain open (e.g., prev_sent == 0 path, asserting is_none() during warm-up iterations) and would further strengthen the suite.
Orthogonal changes
The PR bundles several changes unrelated to SEARCH that make review and bisection harder:
send_stream.rs: Reverts theeffective_prioritycaching from the just-merged #3592, replacing it with on-the-fly computation. The two tests verifying retransmission priority routing are deleted without replacement — see inline comment.pace.rs: Migrates pacer arithmetic fromu64tou128to avoid overflow on high-BDP paths. Correct and worthwhile, but an independent improvement.benches/common.rs: Refactors the benchmark harness and removes the flow-controlled benchmark configuration entirely — see inline comment.
Consider splitting these into their own PRs.
Open items from other reviewers
- @larseggert requested changing the slow-start-exit log from
qinfo!back toqdebug!— no response yet. - @larseggert suggested guarding the
#[should_panic]test with#[cfg(debug_assertions)]or expecting the assertion text — no response yet.
There was a problem hiding this comment.
Review Summary
Clean implementation of SEARCH draft-09 as a SlowStart trait implementation. The bin-based delivery-rate tracking, integer-only arithmetic, and deliberate divergences from the draft (no drain phase, optional reset on stale bins) are well-reasoned and clearly documented.
What looks good
- Correctness: The core algorithm — cumulative byte counters snapshotted into time-binned circular buffers, with linear interpolation for sub-bin lookback — is faithful to the draft. The
u64widening for intermediate computations addresses the 32-bit overflow concerns raised earlier. - Guard chain: The
WarmingUp → RttInflated → ZeroSent → threshold checkcascade inevaluate()correctly handles all degenerate states before reaching the hot path. - Test coverage: Good coverage of initialization, bin advancement, skipped-bin propagation, reset/re-init, steady-rate continuation, and exit detection. The boundary tests (
warming_up,inflated_rtt_is_guarded) exercise the exact guard thresholds. - Trait changes: The
SlowStarttrait extensions (sent_bytes,new_acked_bytes,now) are backward-compatible via default impls, and all existing implementors are updated.
Items to consider
- Spurious recovery interaction: See inline comment —
on_spurious_congestion_eventcan restore slow start without callingslow_start.reset(), leaving SEARCH with stale bin data. Low risk but worth documenting or addressing. EXTRA_BINSinvariant: See inline comment — reinforcing the earlier suggestion for a compile-time assertion to guard the relationship between buffer sizes.compute_sentprecondition: See inline comment —debug_assert!(old >= 1 && new >= 1)would make the underflow precondition explicit.- First RTT quality (existing thread): The
bin_durationis fixed from the first ACK's RTT, which tends to be inflated. Themin(first, second)suggestion from @mxinden seems reasonable to raise with the draft authors, though the current behavior is spec-compliant.
No blocking issues found. The implementation is solid and the omission of the drain phase is a pragmatic choice well-suited to the Cubic CA behavior.
Client/server transfer resultsPerformance differences relative to 143849d. Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Table above only shows statistically significant changes. See all results below. All resultsTransfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Download data for |
There was a problem hiding this comment.
Clean implementation of SEARCH 4.0 with well-documented spec divergences (no drain phase, mandatory reset on stale bins, no bit-shifting optimization). The SlowStart trait extension is backward-compatible — new parameters get default no-op implementations where appropriate, and existing implementors (ClassicSlowStart, HyStart) are updated mechanically.
Algorithm correctness: The circular buffer indices never alias thanks to tight-but-valid sizing (NUM_SENT_BINS = W + EXTRA_BINS + 1 = 26, max lookback = 25). The 32-bit overflow fix using u64 widening in compute_sent/compute_delv is sound. The integer math at SCALE = 100 preserves sufficient precision for the threshold comparison, and the two-step division in initialize (/ SCALE / W) is equivalent to a single / (SCALE * W) since Duration tracks nanoseconds internally.
Test coverage: Good coverage of core algorithm paths — warmup boundary, steady-state continuation, delivery-rate-driven exit, RTT inflation guard, zero-sent guard, reset/reinit cycle, and the bin_duration == 0 edge case. One gap noted in my inline comment: no integration test exercises SEARCH through the full ClassicCongestionController pipeline.
One architectural observation: SEARCH's cumulative byte counters (acked_bytes / sent_bytes) have a subtle interaction with the is_app_limited early return in ClassicCongestionController::on_packets_acked. During app-limited periods in slow start, on_packet_sent still increments sent_bytes (gated only by in_slow_start()), but on_packets_acked is skipped entirely — so acked_bytes falls behind. This is unlikely to matter in practice (app-limited during slow start is rare, and the update_bins reset mechanism guards against large gaps), but it's worth being aware of as the algorithm matures.
Great find by Claude here actually (funnily enough only now even though the logic hasn't changed since the initial commit here). 572bbac that I just pushed fixes this, and coincidentally also gets rid of an extra trait function parameter to For the previous few reviews Claude got a bit confused though, recycling comments that we're already resolved/dismissed and making up some changes that weren't even there (because of conflicts to |
There was a problem hiding this comment.
Solid implementation of SEARCH draft-09. The integer-only arithmetic, circular buffer design with separate acked_bins/sent_bins sizing, and the EXTRA_BINS headroom for RTT inflation are well thought out. The widening to u64 for intermediate computations avoids 32-bit overflow in the critical path. Spec divergences (no drain phase, optional reset on stale bins) are clearly documented with rationale — good practice for a draft that's still evolving.
Key architectural observations:
- Omitting the drain phase is a pragmatic call given CUBIC's buffer-filling behavior. Capturing the drain target in telemetry (as noted in the TODO) will be important for validating this decision.
record_acked_bytesplacement before the app-limited early return is correct for its stated purpose. The counter asymmetry withon_packet_sent(slow-start-only) is safe for the common paths but worth documenting more explicitly (see inline comment).- Test coverage is thorough — all
Outcomevariants are exercised, bin propagation/reset logic is verified at boundaries, and the end-to-end exit test with delivery rate degradation is convincing. The#[cfg(debug_assertions)]zero-RTT panic test is a reasonable compromise given thedebug_assert!approach.
No blocking issues found. The inline comments are minor documentation suggestions.
There was a problem hiding this comment.
Solid implementation of SEARCH draft-09. The integer-arithmetic approach is clean, the circular buffer sizing is correct (with tight but sufficient margins), and all spec divergences are documented with reasoning and links.
What looks good:
- The cumulative byte counter design (separate
record_acked_bytes+on_packets_acked) correctly handles the app-limited case where only the counter needs updating. - Edge cases are well-guarded: zero
bin_duration, RTT inflation beyondEXTRA_BINS, zero sent bytes, and warm-up period. - The
reset()on too-many-skipped-bins is a sensible optional extension of the spec, especially for app-limited senders. - Test suite covers all
Outcomevariants and key state transitions (initialization, bin advancement, skipped bins, reset/reinit, delivery rate detection).
Architectural observation:
The SlowStart trait now has two SEARCH-specific methods with default no-op impls (record_acked_bytes, the sent_bytes parameter on on_packet_sent). This is a reasonable trade-off for now with three implementations, but if more slow-start algorithms are added, consider whether a richer event interface (e.g., a packet-event struct) would be cleaner than growing the trait signature.
No blocking issues found. The inline comments are minor suggestions for maintainability.
Benchmark resultsSignificant performance differences relative to 3287939. transfer/walltime/pacing-true/same-seed: 💔 Performance has regressed by +3.2680%. time: [23.954 ms 23.986 ms 24.031 ms]
change: [+3.0892% +3.2680% +3.4679] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severeAll resultstransfer/1-conn/1-100mb-resp (aka. Download): No change in performance detected. time: [190.09 ms 190.55 ms 191.03 ms]
thrpt: [523.48 MiB/s 524.81 MiB/s 526.08 MiB/s]
change:
time: [-0.4385% -0.0715% +0.2774] (p = 0.70 > 0.05)
thrpt: [-0.2766% +0.0716% +0.4404]
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severetransfer/1-conn/10_000-parallel-1b-resp (aka. RPS): Change within noise threshold. time: [281.46 ms 283.35 ms 285.25 ms]
thrpt: [35.057 Kelem/s 35.292 Kelem/s 35.529 Kelem/s]
change:
time: [-2.4783% -1.4891% -0.5047] (p = 0.00 < 0.05)
thrpt: [+0.5073% +1.5116% +2.5413]
Change within noise threshold.transfer/1-conn/1-1b-resp (aka. HPS): No change in performance detected. time: [38.817 ms 38.980 ms 39.163 ms]
thrpt: [25.534 B/s 25.654 B/s 25.762 B/s]
change:
time: [-0.7418% -0.1100% +0.5190] (p = 0.74 > 0.05)
thrpt: [-0.5164% +0.1101% +0.7474]
No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) high mild
8 (8.00%) high severetransfer/1-conn/1-100mb-req (aka. Upload): Change within noise threshold. time: [192.29 ms 192.66 ms 193.09 ms]
thrpt: [517.90 MiB/s 519.06 MiB/s 520.05 MiB/s]
change:
time: [-0.9577% -0.6590% -0.3295] (p = 0.00 < 0.05)
thrpt: [+0.3306% +0.6633% +0.9670]
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severestreams/walltime/1-streams/each-1000-bytes: No change in performance detected. time: [586.62 µs 588.24 µs 590.20 µs]
change: [-0.0993% +0.3303% +0.7722] (p = 0.13 > 0.05)
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high severestreams/walltime/1000-streams/each-1-bytes: No change in performance detected. time: [12.124 ms 12.154 ms 12.193 ms]
change: [-0.2654% +0.2026% +0.6670] (p = 0.41 > 0.05)
No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severestreams/walltime/1000-streams/each-1000-bytes: Change within noise threshold. time: [43.676 ms 43.757 ms 43.879 ms]
change: [+0.2443% +0.4680% +0.7310] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severestreams-flow-controlled/walltime/1-streams/each-4194304-bytes: Change within noise threshold. time: [33.852 ms 33.905 ms 33.962 ms]
change: [+0.9424% +1.1646% +1.3884] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severestreams-flow-controlled/walltime/10-streams/each-1048576-bytes: No change in performance detected. time: [96.932 ms 98.260 ms 99.614 ms]
change: [-0.3191% +1.5827% +3.4709] (p = 0.11 > 0.05)
No change in performance detected.transfer/walltime/pacing-false/varying-seeds: Change within noise threshold. time: [23.347 ms 23.374 ms 23.411 ms]
change: [+1.4457% +1.5959% +1.7724] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severetransfer/walltime/pacing-true/varying-seeds: Change within noise threshold. time: [23.464 ms 23.495 ms 23.538 ms]
change: [+0.2059% +0.3711% +0.5974] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severetransfer/walltime/pacing-false/same-seed: Change within noise threshold. time: [23.259 ms 23.274 ms 23.290 ms]
change: [+0.4989% +0.6293% +0.7447] (p = 0.00 < 0.05)
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high severetransfer/walltime/pacing-true/same-seed: 💔 Performance has regressed by +3.2680%. time: [23.954 ms 23.986 ms 24.031 ms]
change: [+3.0892% +3.2680% +3.4679] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severeDownload data for |
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-pr as client
neqo-pr as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-pr as client
neqo-pr as server
|
SEARCH 4.0 implementation.
This is still missing tests, which I will add later to this PR, but I thought it makes sense to get the implementation into review already while I work on that.Update 2026-05-04: Have added the tests.
SEARCH specific metrics will be added in a follow-up PR.
I've marked divergences from the most recent draft-09 including reasoning for them in code comments, prefaced by
NOTE.In my limited manual testing during development SEARCH seems to exit slow start before loss very reliably -- more so than HyStart++. Exemplary output and qvis graph from neqo-client cli to cloudflare:
cc: @maryam-ataei (from SEARCH team) FYI the current implementation going into review.