Fix appendOrMergeRPC inefficiency in message size recalculation #582

algorandskiy · 2024-10-01T21:49:57Z

Summary

As discussed in #581 there an inefficiency in appendOrMergeRPC in calling Size() more times than needed.

Fix

Instead of calling lastRPC.Size() that iterates over RPC.Publish, save the last known size and sum with a current message size and protobuf upper bound overhead.

Status and Evaluation

Benchmark results

[master●●] % go test ./ -run ^$ -bench BenchmarkAppendOrMergeRPC -benchmem -count=10 > old.txt
[pavel/append-or-merge●] % go test ./ -run ^$ -bench BenchmarkAppendOrMergeRPC -benchmem -count=10 > new.txt
[pavel/append-or-merge●] % benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/libp2p/go-libp2p-pubsub
cpu: Apple M1 Pro
                          │   old.txt   │               new.txt               │
                          │   sec/op    │   sec/op     vs base                │
AppendOrMergeRPC/small-10   741.8n ± 0%   280.8n ± 0%  -62.15% (p=0.000 n=10)
AppendOrMergeRPC/large-10   36.88µ ± 0%   10.67µ ± 0%  -71.06% (p=0.000 n=10)
geomean                     5.231µ        1.731µ       -66.90%

                          │   old.txt    │                new.txt                │
                          │     B/op     │     B/op      vs base                 │
AppendOrMergeRPC/small-10     368.0 ± 0%     368.0 ± 0%       ~ (p=1.000 n=10) ¹
AppendOrMergeRPC/large-10   18.54Ki ± 0%   18.54Ki ± 0%       ~ (p=1.000 n=10) ¹
geomean                     2.581Ki        2.581Ki       +0.00%
¹ all samples are equal

                          │  old.txt   │               new.txt               │
                          │ allocs/op  │ allocs/op   vs base                 │
AppendOrMergeRPC/small-10   7.000 ± 0%   7.000 ± 0%       ~ (p=1.000 n=10) ¹
AppendOrMergeRPC/large-10   216.0 ± 0%   216.0 ± 0%       ~ (p=1.000 n=10) ¹
geomean                     38.88        38.88       +0.00%
¹ all samples are equal

gossipsub.go

MarcoPolo · 2024-10-01T22:00:18Z

Generally looks good. Make sure to run the Fuzz tests as well. A benchmark might also be helpful here.

algorandskiy · 2025-05-20T21:15:56Z

~~The Fuzz looks working too well and finds broken input even on master branch (rerunning makes it pass tho)~~
~~The same story is on my feature branch so I can't tell if it is a new or pre-existing issue.~~
UPD: looks like OOM issue on my machine.

I explained where "1+" comes from (it is a field key size from pb generator) and added a benchmark.

algorandskiy · 2025-05-21T14:49:23Z

@MarcoPolo could you retrigger the testing job, TestMessageBatchPublish timeout and it never happened on my local machines.

algorandskiy · 2025-05-22T14:09:39Z

Okay, it is

=== RUN   TestMessageBatchPublish
2025/05/22 00:14:32 failed to sufficiently increase receive buffer size (was: 1024 kiB, wanted: 7168 kiB, got: 2048 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
panic: test timed out after 10m0s
	running tests:
		TestMessageBatchPublish (10m0s)

MarcoPolo · 2025-05-22T17:51:53Z

A couple of things to respond to:

Thanks for the bump here. I've been working on a refactor of this code as well, and I think we can combine our efforts. I'll push a PR soon, could I ask you to review it?
The fuzz failure is concerning. I can't seem to reproduce, can you reach out via email so we can discuss details here?
I'll debug the test timeout a bit. Our current testing strategy is pretty flaky in this repo, so it might just be that. We can solve it by migrating to synctest + simulated networks: feat(simconn): Simulated Networks go-libp2p#3262

algorandskiy · 2025-05-22T20:11:03Z

could I ask you to review it?

Absolutely!

can you reach out via email so we can discuss details here?

Sent an email with details

MarcoPolo · 2025-05-23T06:41:12Z

gossipsub_test.go

+		}
+		return RPC{
+			RPC: pb.RPC{
+				Publish: msgs,


In your workload, do you see RPCs being split primarily due to many messages in a single RPC? I ask because we could add some optimizations if so.

correct, more messages added into Publish list, and the max RPC size logic iterates over Publish

@algorandskiy

Builds on #582. 10x faster than current master. 0 allocs. The basic logic is the same as the old version, except we return an `iter.Seq[RPC]` and yield `RPC` types instead of a slice of `*RPC`. This lets us avoid allocations for heap pointers. Please review @algorandskiy, and let me know if this improves your use case.

algorandskiy · 2025-05-29T14:35:49Z

superseded by #615

Fix appendOrMergeRPC inefficiency in message size recalculation

7abd338

MarcoPolo reviewed Oct 1, 2024

View reviewed changes

gossipsub.go Show resolved Hide resolved

algorandskiy added 3 commits May 20, 2025 13:27

fixes

c0898f1

Merge remote-tracking branch 'origin/master' into pavel/append-or-merge

a1d3df7

add benchmark

b106e20

algorandskiy marked this pull request as ready for review May 20, 2025 21:16

algorandskiy requested a review from MarcoPolo May 20, 2025 21:16

MarcoPolo reviewed May 23, 2025

View reviewed changes

MarcoPolo mentioned this pull request May 23, 2025

refactor: 10x faster RPC splitting #615

Merged

algorandskiy closed this May 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix appendOrMergeRPC inefficiency in message size recalculation #582

Fix appendOrMergeRPC inefficiency in message size recalculation #582

Uh oh!

algorandskiy commented Oct 1, 2024 •

edited

Loading

Uh oh!

Uh oh!

MarcoPolo commented Oct 1, 2024

Uh oh!

algorandskiy commented May 20, 2025 •

edited

Loading

Uh oh!

algorandskiy commented May 21, 2025

Uh oh!

algorandskiy commented May 22, 2025

Uh oh!

MarcoPolo commented May 22, 2025

Uh oh!

algorandskiy commented May 22, 2025

Uh oh!

MarcoPolo May 23, 2025

Uh oh!

algorandskiy May 23, 2025

Uh oh!

algorandskiy commented May 29, 2025

Uh oh!

Uh oh!

Fix appendOrMergeRPC inefficiency in message size recalculation #582

Fix appendOrMergeRPC inefficiency in message size recalculation #582

Uh oh!

Conversation

algorandskiy commented Oct 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Status and Evaluation

Uh oh!

Uh oh!

MarcoPolo commented Oct 1, 2024

Uh oh!

algorandskiy commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

algorandskiy commented May 21, 2025

Uh oh!

algorandskiy commented May 22, 2025

Uh oh!

MarcoPolo commented May 22, 2025

Uh oh!

algorandskiy commented May 22, 2025

Uh oh!

MarcoPolo May 23, 2025

Choose a reason for hiding this comment

Uh oh!

algorandskiy May 23, 2025

Choose a reason for hiding this comment

Uh oh!

algorandskiy commented May 29, 2025

Uh oh!

Uh oh!

algorandskiy commented Oct 1, 2024 •

edited

Loading

algorandskiy commented May 20, 2025 •

edited

Loading