Skip to content

Improve encode perf #9826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Improve encode perf #9826

wants to merge 6 commits into from

Conversation

jprochazk
Copy link
Member

@jprochazk jprochazk commented Apr 28, 2025

This brings us roughly back to where we were with 0.22.1. The issue was excessive allocations.

Prost supports zero-copy decoding for byte/string fields, but not for encoding. I had to resort to some unsafe code to remove an extra allocation during serialization to protobuf wire format, which I believe is justified due to the massive difference in maximum memory usage.

Benchmarks

These are very crude, and the variance is quite high. I chose the worst results out of 5 runs.

0.22.1
$ git checkout 0.22.1 && git cherry-pick --no-commit 3b6c163cf2764f6dfd78b8e5976fceaed7f7e13f
$ cargo build -p minimal --release && /usr/bin/time -v ./target/release/minimal > /dev/null


User time (seconds): 8.40
System time (seconds): 0.42
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.63
Maximum resident set size (kbytes): 964012
Minor (reclaiming a frame) page faults: 251991
0.23.0
$ git checkout 0.23.0 && git cherry-pick --no-commit 3b6c163cf2764f6dfd78b8e5976fceaed7f7e13f
$ cargo build -p minimal --release && /usr/bin/time -v ./target/release/minimal > /dev/null

User time (seconds): 8.38
System time (seconds): 1.16
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.22
Maximum resident set size (kbytes): 1505304
Minor (reclaiming a frame) page faults: 936506
This branch
$ git checkout jan/encode-perf-benchmark
$ cargo build -p minimal --release && /usr/bin/time -v ./target/release/minimal > /dev/null

User time (seconds): 8.38
System time (seconds): 0.47
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.73
Maximum resident set size (kbytes): 1170748
Minor (reclaiming a frame) page faults: 296538

The most important metrics are wall clock time (how long it took for the program to exit), and max rss (the maximum amount of physical RAM the program used at any single point in time). I'm also including system time and page faults, both of which together provide a hint about how much the program is asking the OS to manage memory for it.

There may be a similar regression in decode perf, but I have not looked into that. It's unlikely to be as severe though. There are definitely improvements to be made there if we feel the need for them, such as decoding from Bytes which is specialized to zero-copy in prost.

Copy link

github-actions bot commented Apr 28, 2025

Web viewer built successfully. If applicable, you should also test it:

  • I have tested the web viewer
Result Commit Link Manifest
0d5ccd6 https://rerun.io/viewer/pr/9826 +nightly +main

Note: This comment is updated whenever you push a commit.

@jprochazk jprochazk added 📉 performance Optimization, memory use, etc include in changelog labels Apr 28, 2025
@jprochazk jprochazk added this to the 0.23.2 milestone Apr 28, 2025
@jprochazk jprochazk added 📉 performance Optimization, memory use, etc and removed 📉 performance Optimization, memory use, etc labels Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
include in changelog 📉 performance Optimization, memory use, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential regression of slower speed v0.22.1->v0.23.0
1 participant