perf(crypto): eliminate bounds checks in header protection mask computation#3049
Conversation
…tation Change `hp::Key::mask()` and `CryptoDxState::compute_mask()` to accept `&[u8; 16]` instead of `&[u8]`, making the correct sample size a compile-time guarantee rather than a runtime check. - Modified `hp::Key::mask()` signature to accept `&[u8; 16]` - Removed runtime bounds checks in both AES and ChaCha20 code paths - Updated `CryptoDxState::compute_mask()` to match new signature - Updated call sites in `packet/mod.rs` to use `try_into()` for type-safe conversion - Removed panic tests that are no longer needed (compiler enforces correct size) - **Eliminates runtime bounds checks** on hot path (every QUIC packet) - **Type safety** - API enforces correct sample size at compile time - **Cleaner implementation** - no defensive slicing needed
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3049 +/- ##
==========================================
- Coverage 93.42% 93.36% -0.07%
==========================================
Files 124 124
Lines 35970 35972 +2
Branches 35970 35972 +2
==========================================
- Hits 33604 33584 -20
- Misses 1522 1543 +21
- Partials 844 845 +1
|
|
| Branch | fix-compute_mask |
| Testbed | On-prem |
🚨 1 Alert
| Benchmark | Measure Units | View | Benchmark Result (Result Δ%) | Upper Boundary (Limit %) |
|---|---|---|---|---|
| 1000-streams/each-1000-bytes/simulated-time | Latency seconds (s) | 📈 plot 🚷 threshold 🚨 alert (🔔) | 19.20 s(+1.45%)Baseline: 18.93 s | 19.20 s (100.03%) |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result nanoseconds (ns) (Result Δ%) | Upper Boundary nanoseconds (ns) (Limit %) |
|---|---|---|---|
| 1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client | 📈 view plot 🚷 view threshold | 205,330,000.00 ns(-1.43%)Baseline: 208,312,462.31 ns | 218,059,101.35 ns (94.16%) |
| 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client | 📈 view plot 🚷 view threshold | 199,100,000.00 ns(-1.59%)Baseline: 202,326,030.15 ns | 212,876,681.52 ns (93.53%) |
| 1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client | 📈 view plot 🚷 view threshold | 28,474,000.00 ns(+0.19%)Baseline: 28,420,532.66 ns | 28,854,554.86 ns (98.68%) |
| 1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client | 📈 view plot 🚷 view threshold | 289,860,000.00 ns(-1.33%)Baseline: 293,772,864.32 ns | 306,417,293.87 ns (94.60%) |
| 1-streams/each-1000-bytes/simulated-time | 📈 view plot 🚷 view threshold | 119,360,000.00 ns(+0.81%)Baseline: 118,398,844.22 ns | 120,890,048.30 ns (98.73%) |
| 1-streams/each-1000-bytes/wallclock-time | 📈 view plot 🚷 view threshold | 591,970.00 ns(-0.86%)Baseline: 597,121.36 ns | 621,993.46 ns (95.17%) |
| 1000-streams/each-1-bytes/simulated-time | 📈 view plot 🚷 view threshold | 14,988,000,000.00 ns(-0.03%)Baseline: 14,992,160,804.02 ns | 15,010,529,528.28 ns (99.85%) |
| 1000-streams/each-1-bytes/wallclock-time | 📈 view plot 🚷 view threshold | 13,584,000.00 ns(-4.13%)Baseline: 14,168,572.86 ns | 14,967,622.07 ns (90.76%) |
| 1000-streams/each-1000-bytes/simulated-time | 📈 view plot 🚷 view threshold 🚨 view alert (🔔) | 19,202,000,000.00 ns(+1.45%)Baseline: 18,927,040,201.01 ns | 19,196,649,057.86 ns (100.03%) |
| 1000-streams/each-1000-bytes/wallclock-time | 📈 view plot 🚷 view threshold | 48,654,000.00 ns(-6.32%)Baseline: 51,937,849.25 ns | 58,472,130.57 ns (83.21%) |
| RxStreamOrderer::inbound_frame() | 📈 view plot 🚷 view threshold | 107,880,000.00 ns(-1.69%)Baseline: 109,736,683.42 ns | 111,936,008.63 ns (96.38%) |
| coalesce_acked_from_zero 1+1 entries | 📈 view plot 🚷 view threshold | 88.63 ns(+0.02%)Baseline: 88.62 ns | 89.28 ns (99.27%) |
| coalesce_acked_from_zero 10+1 entries | 📈 view plot 🚷 view threshold | 106.10 ns(+0.02%)Baseline: 106.08 ns | 107.07 ns (99.10%) |
| coalesce_acked_from_zero 1000+1 entries | 📈 view plot 🚷 view threshold | 88.88 ns(-1.04%)Baseline: 89.81 ns | 94.34 ns (94.21%) |
| coalesce_acked_from_zero 3+1 entries | 📈 view plot 🚷 view threshold | 106.44 ns(-0.15%)Baseline: 106.60 ns | 107.56 ns (98.96%) |
| decode 1048576 bytes, mask 3f | 📈 view plot 🚷 view threshold | 1,595,200.00 ns(+0.17%)Baseline: 1,592,533.17 ns | 1,599,506.49 ns (99.73%) |
| decode 1048576 bytes, mask 7f | 📈 view plot 🚷 view threshold | 5,055,900.00 ns(-0.03%)Baseline: 5,057,449.25 ns | 5,076,799.73 ns (99.59%) |
| decode 1048576 bytes, mask ff | 📈 view plot 🚷 view threshold | 3,034,200.00 ns(+0.08%)Baseline: 3,031,894.97 ns | 3,044,025.77 ns (99.68%) |
| decode 4096 bytes, mask 3f | 📈 view plot 🚷 view threshold | 8,297.50 ns(+0.03%)Baseline: 8,295.40 ns | 8,342.12 ns (99.47%) |
| decode 4096 bytes, mask 7f | 📈 view plot 🚷 view threshold | 19,977.00 ns(-0.15%)Baseline: 20,007.62 ns | 20,087.11 ns (99.45%) |
| decode 4096 bytes, mask ff | 📈 view plot 🚷 view threshold | 11,636.00 ns(-0.73%)Baseline: 11,721.67 ns | 11,962.94 ns (97.27%) |
| sent::Packets::take_ranges | 📈 view plot 🚷 view threshold | 4,623.40 ns(-2.58%)Baseline: 4,745.86 ns | 4,986.08 ns (92.73%) |
| transfer/pacing-false/same-seed/simulated-time/run | 📈 view plot 🚷 view threshold | 25,710,000,000.00 ns(+1.62%)Baseline: 25,299,289,340.10 ns | 25,876,179,914.57 ns (99.36%) |
| transfer/pacing-false/same-seed/wallclock-time/run | 📈 view plot 🚷 view threshold | 25,552,000.00 ns(-1.62%)Baseline: 25,973,411.17 ns | 27,051,159.32 ns (94.46%) |
| transfer/pacing-false/varying-seeds/simulated-time/run | 📈 view plot 🚷 view threshold | 25,180,000,000.00 ns(+0.04%)Baseline: 25,169,050,761.42 ns | 25,216,570,884.93 ns (99.85%) |
| transfer/pacing-false/varying-seeds/wallclock-time/run | 📈 view plot 🚷 view threshold | 25,395,000.00 ns(-2.91%)Baseline: 26,154,796.95 ns | 27,632,073.17 ns (91.90%) |
| transfer/pacing-true/same-seed/simulated-time/run | 📈 view plot 🚷 view threshold | 25,675,000,000.00 ns(+0.25%)Baseline: 25,610,964,467.01 ns | 25,700,909,771.63 ns (99.90%) |
| transfer/pacing-true/same-seed/wallclock-time/run | 📈 view plot 🚷 view threshold | 26,568,000.00 ns(-2.88%)Baseline: 27,356,370.56 ns | 28,789,266.68 ns (92.28%) |
| transfer/pacing-true/varying-seeds/simulated-time/run | 📈 view plot 🚷 view threshold | 25,024,000,000.00 ns(+0.12%)Baseline: 24,994,380,710.66 ns | 25,043,805,214.89 ns (99.92%) |
| transfer/pacing-true/varying-seeds/wallclock-time/run | 📈 view plot 🚷 view threshold | 26,010,000.00 ns(-2.50%)Baseline: 26,676,329.95 ns | 28,205,478.35 ns (92.22%) |
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes header protection mask computation by eliminating runtime bounds checks, changing the API to accept fixed-size arrays instead of slices for better type safety and performance.
- Modified
hp::Key::mask()to accept&[u8; 16]instead of&[u8]for compile-time size guarantees - Updated all call sites to use
try_into()for type-safe conversion from slices to arrays - Removed panic tests that are no longer needed since the compiler enforces correct array sizes
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| neqo-crypto/src/hp.rs | Updated mask() method signature and removed runtime bounds checking |
| neqo-transport/src/crypto.rs | Updated compute_mask() to match new signature |
| neqo-transport/src/packet/mod.rs | Updated call sites to use try_into() for array conversion |
| test-fixture/src/header_protection.rs | Updated test helper functions to use new API |
| neqo-crypto/tests/hp.rs | Updated test assertions and removed obsolete panic tests |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
| Branch | fix-compute_mask |
| Testbed | On-prem |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| google vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 278.35 ms(+0.24%)Baseline: 277.69 ms | 280.29 ms (99.31%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| msquic vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 203.27 ms(+4.02%)Baseline: 195.41 ms | 230.38 ms (88.23%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. google (cubic, paced) | 📈 view plot 🚷 view threshold | 757.30 ms(-0.04%)Baseline: 757.59 ms | 764.75 ms (99.03%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. msquic (cubic, paced) | 📈 view plot 🚷 view threshold | 158.44 ms(+0.66%)Baseline: 157.40 ms | 160.12 ms (98.95%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. neqo (cubic) | 📈 view plot 🚷 view threshold | 89.29 ms(-1.44%)Baseline: 90.59 ms | 94.34 ms (94.65%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 90.80 ms(-1.08%)Baseline: 91.79 ms | 95.47 ms (95.11%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. neqo (reno) | 📈 view plot 🚷 view threshold | 87.88 ms(-2.92%)Baseline: 90.52 ms | 93.92 ms (93.57%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. neqo (reno, paced) | 📈 view plot 🚷 view threshold | 90.27 ms(-1.66%)Baseline: 91.80 ms | 95.33 ms (94.70%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. quiche (cubic, paced) | 📈 view plot 🚷 view threshold | 193.69 ms(-0.03%)Baseline: 193.75 ms | 197.16 ms (98.24%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. s2n (cubic, paced) | 📈 view plot 🚷 view threshold | 220.50 ms(-0.13%)Baseline: 220.79 ms | 223.56 ms (98.63%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| quiche vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 157.09 ms(+2.84%)Baseline: 152.75 ms | 158.46 ms (99.13%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| s2n vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 175.61 ms(+0.92%)Baseline: 174.01 ms | 178.16 ms (98.57%) |
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <[email protected]> Signed-off-by: Lars Eggert <[email protected]>
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to 7334a18. neqo-latest as client
neqo-latest as server
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
|
Client/server transfer resultsPerformance differences relative to 7334a18. Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Download data for |
Benchmark resultsPerformance differences relative to 7334a18. 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: No change in performance detected. time: [198.77 ms 199.10 ms 199.46 ms]
thrpt: [501.36 MiB/s 502.25 MiB/s 503.10 MiB/s]
change:
time: [−0.3694% −0.0371% +0.2495%] (p = 0.83 > 0.05)
thrpt: [−0.2488% +0.0371% +0.3708%]
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected. time: [288.23 ms 289.86 ms 291.52 ms]
thrpt: [34.303 Kelem/s 34.499 Kelem/s 34.695 Kelem/s]
change:
time: [−0.0597% +0.7929% +1.6595%] (p = 0.06 > 0.05)
thrpt: [−1.6324% −0.7867% +0.0597%]
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected. time: [28.435 ms 28.474 ms 28.515 ms]
thrpt: [35.070 B/s 35.120 B/s 35.168 B/s]
change:
time: [−0.6594% −0.2693% +0.0686%] (p = 0.16 > 0.05)
thrpt: [−0.0685% +0.2700% +0.6637%]
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💔 Performance has regressed. time: [205.00 ms 205.33 ms 205.70 ms]
thrpt: [486.13 MiB/s 487.03 MiB/s 487.81 MiB/s]
change:
time: [+1.0615% +1.2973% +1.5524%] (p = 0.00 < 0.05)
thrpt: [−1.5286% −1.2806% −1.0504%]
decode 4096 bytes, mask ff: No change in performance detected. time: [11.599 µs 11.636 µs 11.679 µs]
change: [−1.0891% −0.3108% +0.3048%] (p = 0.43 > 0.05)
decode 1048576 bytes, mask ff: No change in performance detected. time: [3.0244 ms 3.0342 ms 3.0456 ms]
change: [−0.3853% +0.1244% +0.6275%] (p = 0.65 > 0.05)
decode 4096 bytes, mask 7f: No change in performance detected. time: [19.934 µs 19.977 µs 20.027 µs]
change: [−0.8245% −0.2168% +0.4128%] (p = 0.52 > 0.05)
decode 1048576 bytes, mask 7f: No change in performance detected. time: [5.0439 ms 5.0559 ms 5.0690 ms]
change: [−0.4977% −0.1339% +0.2380%] (p = 0.49 > 0.05)
decode 4096 bytes, mask 3f: No change in performance detected. time: [8.2647 µs 8.2975 µs 8.3379 µs]
change: [−1.3999% −0.2436% +0.6095%] (p = 0.72 > 0.05)
decode 1048576 bytes, mask 3f: No change in performance detected. time: [1.5869 ms 1.5952 ms 1.6062 ms]
change: [−0.8665% −0.0585% +0.9144%] (p = 0.90 > 0.05)
1-streams/each-1000-bytes/wallclock-time: No change in performance detected. time: [589.44 µs 591.97 µs 594.97 µs]
change: [−0.7188% −0.1125% +0.5416%] (p = 0.72 > 0.05)
1000-streams/each-1-bytes/wallclock-time: 💚 Performance has improved. time: [13.557 ms 13.584 ms 13.611 ms]
change: [−2.1697% −1.8906% −1.6322%] (p = 0.00 < 0.05)
1000-streams/each-1-bytes/simulated-time: No change in performance detected. time: [14.975 s 14.988 s 15.000 s]
thrpt: [66.666 B/s 66.722 B/s 66.778 B/s]
change:
time: [−0.1614% −0.0384% +0.0911%] (p = 0.54 > 0.05)
thrpt: [−0.0910% +0.0384% +0.1617%]
1000-streams/each-1000-bytes/wallclock-time: No change in performance detected. time: [48.393 ms 48.654 ms 48.996 ms]
change: [−0.5380% +0.1004% +0.8550%] (p = 0.80 > 0.05)
coalesce_acked_from_zero 1+1 entries: No change in performance detected. time: [88.276 ns 88.630 ns 88.983 ns]
change: [−0.2526% +0.3581% +1.0533%] (p = 0.31 > 0.05)
coalesce_acked_from_zero 3+1 entries: No change in performance detected. time: [106.06 ns 106.44 ns 106.83 ns]
change: [−0.3423% +0.2018% +0.6924%] (p = 0.46 > 0.05)
coalesce_acked_from_zero 10+1 entries: No change in performance detected. time: [105.50 ns 106.10 ns 106.84 ns]
change: [−2.0206% −0.4870% +0.5365%] (p = 0.56 > 0.05)
coalesce_acked_from_zero 1000+1 entries: No change in performance detected. time: [88.796 ns 88.878 ns 88.978 ns]
change: [−1.5005% −0.5040% +0.4477%] (p = 0.33 > 0.05)
RxStreamOrderer::inbound_frame(): Change within noise threshold. time: [107.73 ms 107.88 ms 108.08 ms]
change: [−0.5010% −0.2855% −0.0618%] (p = 0.01 < 0.05)
sent::Packets::take_ranges: No change in performance detected. time: [4.5191 µs 4.6234 µs 4.7174 µs]
change: [−5.0540% −1.9145% +1.4235%] (p = 0.26 > 0.05)
transfer/pacing-false/varying-seeds/wallclock-time/run: Change within noise threshold. time: [25.357 ms 25.395 ms 25.433 ms]
change: [−2.1319% −1.8819% −1.6426%] (p = 0.00 < 0.05)
transfer/pacing-false/varying-seeds/simulated-time/run: No change in performance detected. time: [25.144 s 25.180 s 25.216 s]
thrpt: [162.44 KiB/s 162.67 KiB/s 162.90 KiB/s]
change:
time: [−0.0920% +0.0898% +0.2862%] (p = 0.35 > 0.05)
thrpt: [−0.2854% −0.0898% +0.0921%]
transfer/pacing-true/varying-seeds/wallclock-time/run: Change within noise threshold. time: [25.929 ms 26.010 ms 26.098 ms]
change: [−1.1820% −0.7496% −0.3454%] (p = 0.00 < 0.05)
transfer/pacing-true/varying-seeds/simulated-time/run: Change within noise threshold. time: [24.987 s 25.024 s 25.063 s]
thrpt: [163.43 KiB/s 163.68 KiB/s 163.93 KiB/s]
change:
time: [+0.0143% +0.2253% +0.4356%] (p = 0.04 < 0.05)
thrpt: [−0.4337% −0.2248% −0.0143%]
transfer/pacing-false/same-seed/wallclock-time/run: 💚 Performance has improved. time: [25.516 ms 25.552 ms 25.608 ms]
change: [−3.7105% −3.4814% −3.2279%] (p = 0.00 < 0.05)
transfer/pacing-false/same-seed/simulated-time/run: No change in performance detected. time: [25.710 s 25.710 s 25.710 s]
thrpt: [159.31 KiB/s 159.31 KiB/s 159.31 KiB/s]
change:
time: [+0.0000% +0.0000% +0.0000%] (p = NaN > 0.05)
thrpt: [+0.0000% +0.0000% +0.0000%]
transfer/pacing-true/same-seed/wallclock-time/run: 💚 Performance has improved. time: [26.551 ms 26.568 ms 26.586 ms]
change: [−3.4185% −3.3244% −3.2334%] (p = 0.00 < 0.05)
transfer/pacing-true/same-seed/simulated-time/run: No change in performance detected. time: [25.675 s 25.675 s 25.675 s]
thrpt: [159.53 KiB/s 159.53 KiB/s 159.53 KiB/s]
change:
time: [+0.0000% +0.0000% +0.0000%] (p = NaN > 0.05)
thrpt: [+0.0000% +0.0000% +0.0000%]
Download data for |
Change
hp::Key::mask()andCryptoDxState::compute_mask()to accept&[u8; 16]instead of&[u8], making the correct sample size a compile-time guarantee rather than a runtime check.Key changes
hp::Key::mask()signature to accept&[u8; 16]CryptoDxState::compute_mask()to match new signaturepacket/mod.rsto usetry_into()for type-safe conversionBenefits