Skip to content

perf(macOS): avoid copying large protocol bodies#1719

Merged
Legend-Master merged 10 commits into
tauri-apps:devfrom
Tunglies:perf/macos-buffer
May 13, 2026
Merged

perf(macOS): avoid copying large protocol bodies#1719
Legend-Master merged 10 commits into
tauri-apps:devfrom
Tunglies:perf/macos-buffer

Conversation

@Tunglies
Copy link
Copy Markdown
Contributor

On macOS, Wry now avoids an extra copy for owned custom protocol
response bodies of 128KB or larger by transferring the Vec directly into NSData.

@Tunglies Tunglies marked this pull request as ready for review April 30, 2026 08:58
@Tunglies Tunglies requested a review from a team as a code owner April 30, 2026 08:58
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

Package Changes Through 6a3aa79

There are 1 changes which include wry with minor

Planned Package Versions

The following package releases are the planned based on the context of changes in this pull request.

package current next
wry 0.55.1 0.56.0

Add another change file through the GitHub UI by following this link.


Read about change files or the docs at github.com/jbolda/covector

Copy link
Copy Markdown
Contributor

@Legend-Master Legend-Master left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think I understand this, NSData::from_vec uses initWithBytes_length internally as well, how is it avoiding the copy here? And why are we only doing this for the bigger chunks?

That being said, I don't mind if we migrate to NSData::with_bytes here for readability though

@Tunglies
Copy link
Copy Markdown
Contributor Author

Tunglies commented May 8, 2026

NSData::from_vec did not use initWithBytes_length but with_vec internally. And with_vec uses initWithBytesNoCopy:length:deallocator: to hand the Vec buffer to NSData without copying.

The chunk size is a workaround based on local benchmarks; it yields benefits by eliminating buffer copying starting from 88 bytes. However, sizes below 88 bytes cause regressions. 128 bytes as it aligns better with engineering standards and is more human-readable.

@Legend-Master
Copy link
Copy Markdown
Contributor

Sorry my bad, looked at the wrong line 🤦‍♂️, thanks for checking

Legend-Master
Legend-Master previously approved these changes May 9, 2026
Copy link
Copy Markdown
Contributor

@Legend-Master Legend-Master left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just some nitpicks

Comment thread src/wkwebview/class/url_scheme_handler.rs Outdated
Comment thread src/wkwebview/class/url_scheme_handler.rs Outdated
@Tunglies
Copy link
Copy Markdown
Contributor Author

Some custom benchmark, for only IPC performance part.

Payload dev ms/call perf ms/call Delta dev throughput perf throughput
128KB 0.458 ms 0.437 ms -4.56% 272.8 MiB/s 285.8 MiB/s
256KB 0.568 ms 0.515 ms -9.30% 440.1 MiB/s 485.3 MiB/s
512KB 0.768 ms 0.671 ms -12.68% 652.0 MiB/s 745.9 MiB/s
1024KB 1.124 ms 0.918 ms -18.35% 892.1 MiB/s 1091.1 MiB/s
2048KB 1.729 ms 1.400 ms -19.04% 1158.6 MiB/s 1432.3 MiB/s
3072KB 2.371 ms 1.800 ms -24.10% 1273.3 MiB/s 1672.7 MiB/s

@Tunglies
Copy link
Copy Markdown
Contributor Author

Sorry my bad, looked at the wrong line 🤦‍♂️, thanks for checking

No, you good. I get some mess at time. :)

@Legend-Master
Copy link
Copy Markdown
Contributor

Some custom benchmark, for only IPC performance part.

I'm entirely sure I get the table headers, like is dev before and perf after the change? Do we have a data point for below NO_COPY_DATA_THRESHOLD like the regression you mentioned? Also was this ran in release build, asking because simply copying a 128KB blob took way longer than I thought

@Tunglies
Copy link
Copy Markdown
Contributor Author

Tunglies commented May 12, 2026

Adjusted my benchmark closer to IPC performance and avoid zero data optimize for benchmark, it tells 64Kb could get around 3.33%-4% improvement which conflict with the previous lower than 88KB leads regression, but now latest no-copy size regression(noise) happens on lower than 64KB. But 64KB performance have very slightly improved. Let's keep threshold to 128 for engineering standards.


dev: wry 44e26ef
perf: current PR, THRESHOLD=1bit
Ran in release build.

Greater than 64KB do have significant improvement.

Payload Copy median(dev) No-copy median(perf) Median delta Verdict
1 KiB 0.353 ms 0.349 ms -0.97% no-copy faster
2 KiB 0.352 ms 0.353 ms 0.42% no-copy slower
4 KiB 0.364 ms 0.363 ms -0.20% neutral
8 KiB 0.364 ms 0.367 ms 0.80% no-copy slower
16 KiB 0.362 ms 0.363 ms 0.20% neutral
32 KiB 0.366 ms 0.362 ms -1.13% no-copy faster
64 KiB 0.397 ms 0.382 ms -3.93% no-copy faster
128 KiB 0.451 ms 0.431 ms -4.55% no-copy faster
256 KiB 0.553 ms 0.520 ms -6.01% no-copy faster
512 KiB 0.715 ms 0.656 ms -8.20% no-copy faster
1024 KiB 1.047 ms 0.906 ms -13.43% no-copy faster
2048 KiB 1.563 ms 1.375 ms -12.00% no-copy faster
3072 KiB 2.214 ms 1.833 ms -17.20% no-copy faster

Lower than 64KB most are noise or regression at sometimes, not stable to test.
64KB makes very small difference, cant tells the difference in 0.01x ms.

Payload Copy median(dev) No-copy median(perf) Median delta Verdict
28 KiB 0.434 ms 0.447 ms 3.04% no-copy slower
30 KiB 0.414 ms 0.426 ms 2.95% noise
32 KiB 0.379 ms 0.376 ms -0.84% noise
34 KiB 0.383 ms 0.376 ms -1.83% noise
36 KiB 0.383 ms 0.380 ms -0.72% noise
38 KiB 0.384 ms 0.380 ms -0.98% noise
40 KiB 0.380 ms 0.376 ms -1.20% noise
42 KiB 0.382 ms 0.374 ms -2.10% noise
44 KiB 0.387 ms 0.386 ms -0.26% noise
46 KiB 0.389 ms 0.384 ms -1.26% noise
48 KiB 0.392 ms 0.385 ms -1.68% noise
52 KiB 0.393 ms 0.388 ms -1.41% noise
56 KiB 0.394 ms 0.386 ms -1.85% noise
60 KiB 0.400 ms 0.392 ms -1.95% noise
64 KiB 0.409 ms 0.393 ms -3.82% no-copy faster

@Tunglies Tunglies force-pushed the perf/macos-buffer branch from 37cf53f to 226a006 Compare May 12, 2026 16:42
@Legend-Master
Copy link
Copy Markdown
Contributor

If the regression is mostly noises, what about removing the threshold?

@Tunglies
Copy link
Copy Markdown
Contributor Author

If the regression is mostly noises, what about removing the threshold?

Sort of, but really can't tells the real world performance. For stability let's keep the 128KB size for those performance improvements we can actually reproduced.

@Legend-Master
Copy link
Copy Markdown
Contributor

It's more about the readability here, if the performance isn't worse

Comment thread src/wkwebview/class/url_scheme_handler.rs Outdated
Co-authored-by: Tony <68118705+Legend-Master@users.noreply.github.com>
Comment thread .changes/macos-protocol-body-nocopy.md Outdated
@Legend-Master
Copy link
Copy Markdown
Contributor

Thanks! And also for the patience!

@Legend-Master Legend-Master merged commit 5bdda32 into tauri-apps:dev May 13, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants