fixed fec sets #5771

bw-solana · 2025-04-11T16:26:15Z

Problem

We would like to move to fixed (32:32) FEC set sizes to simplify multiple areas of the code.

This PR is simply a draft to prove out the concept and is not meant to be merged.

If we decide to pursue this, we will need to:

Tweak some of the batching logic to make sure we're not sending in too many small data sets and generating excessive padding <-- this is now WIP
Chunk up into several smaller PRs to make reviewable
Revisit the unit tests to make sure they still make sense and add value.

Summary of Changes

bw-solana · 2025-04-11T20:07:21Z

Currently seeing ~7% padding overhead running with 4eb25e2

This is similar to what Jump has observed using the same entry coalesce bytes target.
~6% overhead seen on mainnet with the current variable FEC set size.

bw-solana · 2025-04-11T20:15:20Z

As far as what is driving the need for padding, here are some logs that shed light:

[2025-04-11T20:07:37.683983999Z WARN    solana_turbine::broadcast_stage::broadcast_utils] #BW: Breaking out of   coalesce loop, serialized_batch_byte_count: 83048, entry_bytes: 13593,   target_serialized_batch_byte_count: 92448
[2025-04-11T20:07:37.686228591Z WARN    solana_turbine::broadcast_stage::broadcast_utils] #BW: Not entering   coalesce loop, serialized_batch_byte_count: 116803,   target_serialized_batch_byte_count: 92448
[2025-04-11T20:07:37.691491722Z WARN    solana_turbine::broadcast_stage::broadcast_utils] #BW: Breaking out of   coalesce loop, serialized_batch_byte_count: 92151, entry_bytes: 693,   target_serialized_batch_byte_count: 92448
[2025-04-11T20:07:37.700535140Z WARN    solana_turbine::broadcast_stage::broadcast_utils] #BW: Breaking out of   coalesce loop, serialized_batch_byte_count: 90002, entry_bytes: 3488,   target_serialized_batch_byte_count: 92448
[2025-04-11T20:07:37.713070269Z WARN    solana_turbine::broadcast_stage::broadcast_utils] #BW: Breaking out of   coalesce loop, serialized_batch_byte_count: 92015, entry_bytes: 10368,   target_serialized_batch_byte_count: 92448
[2025-04-11T20:07:37.716405526Z WARN    solana_turbine::broadcast_stage::broadcast_utils] #BW: Breaking out of   coalesce loop, serialized_batch_byte_count: 83192, entry_bytes: 13593,   target_serialized_batch_byte_count: 92448
[2025-04-11T20:07:37.721629706Z WARN    solana_turbine::broadcast_stage::broadcast_utils] #BW: Breaking out of   coalesce loop, serialized_batch_byte_count: 90408, entry_bytes: 3488,   target_serialized_batch_byte_count: 92448
[2025-04-11T20:07:37.736128600Z WARN    solana_turbine::broadcast_stage::broadcast_utils] #BW: Breaking out of   coalesce loop, serialized_batch_byte_count: 90170, entry_bytes: 3488,   target_serialized_batch_byte_count: 92448
[2025-04-11T20:07:37.744027342Z WARN    solana_turbine::broadcast_stage::broadcast_utils] #BW: Breaking out of   coalesce loop, serialized_batch_byte_count: 90552, entry_bytes: 6928,   target_serialized_batch_byte_count: 92448
[2025-04-11T20:07:37.746227733Z WARN    solana_turbine::broadcast_stage::broadcast_utils] #BW: Not entering   coalesce loop, serialized_batch_byte_count: 108491,   target_serialized_batch_byte_count: 92448
[2025-04-11T20:07:37.799141052Z WARN    solana_turbine::broadcast_stage::broadcast_utils] #BW: Breaking out of   coalesce loop, timed out, serialized_batch_byte_count: 48782,   target_serialized_batch_byte_count: 92448
[2025-04-11T20:07:37.830889203Z WARN    solana_turbine::broadcast_stage::broadcast_utils] #BW: Breaking out of   coalesce loop, serialized_batch_byte_count: 83146, entry_bytes: 13593,   target_serialized_batch_byte_count: 92448
[2025-04-11T20:07:37.834880165Z WARN    solana_turbine::broadcast_stage::broadcast_utils] #BW: Not entering   coalesce loop, serialized_batch_byte_count: 93368,   target_serialized_batch_byte_count: 92448
[2025-04-11T20:07:37.843824541Z WARN    solana_turbine::broadcast_stage::broadcast_utils] #BW: Breaking out of   coalesce loop, serialized_batch_byte_count: 85962, entry_bytes: 13593,   target_serialized_batch_byte_count: 92448
[2025-04-11T20:07:37.847022015Z WARN    solana_turbine::broadcast_stage::broadcast_utils] #BW: Breaking out of   coalesce loop, serialized_batch_byte_count: 84434, entry_bytes: 13593,   target_serialized_batch_byte_count: 92448
[2025-04-11T20:07:37.849953892Z WARN    solana_turbine::broadcast_stage::broadcast_utils] #BW: Not entering   coalesce loop, serialized_batch_byte_count: 143154,   target_serialized_batch_byte_count: 92448

There are a couple of "bad" cases that lead to more padding:

Draining large number of entries from the receiver up front. So many that it already exceeds the target. The portion that exceeds the target is going to result in ~1/2 a batch of padding on average.
The large entry size (>10kB) making it easy to exceed target and resulting in large amount of empty space to pad at the end.

A few potential options to get better here:

Increase the target coalesce bytes even more. Downside here is delaying pushing out shreds.
Force smaller tx batching upstream. This seems complex and will introduce other perf implications.
Try to buffer excess entries for "bad" case 1.

codecov-commenter · 2025-04-11T20:26:07Z

Codecov Report

Attention: Patch coverage is 92.45810% with 27 lines in your changes missing coverage. Please review.

Project coverage is 82.9%. Comparing base (86b229b) to head (7041ad4).
Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##           master    #5771     +/-   ##
=========================================
- Coverage    82.9%    82.9%   -0.1%     
=========================================
  Files         830      830             
  Lines      376347   376498    +151     
=========================================
+ Hits       312282   312386    +104     
- Misses      64065    64112     +47

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

bw-solana · 2025-04-14T18:08:33Z

Running the same experiment (bench-tps spamming ~30k TPS) using this code vs. master code I observe the following:

This code:

~200kB of extra data pad bytes per slot --> ~200kB of extra coding bytes
~7.5k shreds per slot
~33k TPS observed
Overall padding bytes = ~400kB per slot --> ~5% padding overhead

Master code:

~51kB of extra data pad bytes per slot --> ~51kB of extra coding bytes
~15kB worth of extra coding shreds per slot due to erasure batch < 32 data shreds
~30k TPS
Overall padding bytes = ~117k per slot --> ~2% padding overhead

bw-solana · 2025-04-15T15:34:45Z

On mainnet over the last 2 weeks, the overhead from variable coding size has been around 2%:

SELECT (mean("num_merkle_coding_shreds")-mean("num_merkle_data_shreds"))/mean("num_merkle_data_shreds")*100 AS "var_coding_overhead" FROM "mainnet-beta"."autogen"."broadcast-process-shreds-stats" WHERE time > :dashboardTime: AND time < :upperDashboardTime: GROUP BY time(1d) FILL(null)

On mainnet over the last 2 weeks, the overhead from padding out the last data shreds has averaged around ~10 data shreds per slot or ~1.2% overhead:

SELECT mean("data_buffer_residual")/1024*100/mean("num_merkle_data_shreds") AS "data_buffer_residual_overhead" FROM "mainnet-beta"."autogen"."broadcast-process-shreds-stats" WHERE time > :dashboardTime: AND time < :upperDashboardTime: GROUP BY time(:interval:) FILL(null)

The final source of overhead would be the padding out of the last FEC set to 32 shreds. We don't have metrics for this, but my assumption is that we're padding half the FEC set on average, which would be 16 shreds, and this would result in ~1.92% padding overhead.

So overall, my understanding is mainnet has ~5% padding overhead. This lines up w/ some other measurements analyzing zero content data in the 5-6% range

bw-solana · 2025-04-15T17:00:04Z

Took a sample of 100k entries on mainnet, and the workload is much more favorable for packing fixed batches than the synthetic testing. This is because most entries contain a single tx (76%) or none (10.5%) and are only hundreds of bytes in size.

Looks like most of the high tx count entries are a result of votes being batched together (inferring based on AVG tx size being right around 352B for larger entries)

alexpyattaev · 2025-04-17T16:48:51Z

ledger/src/shred/merkle.rs

suggest moving the shredding logic to another file. merkle.rs is already huge enough.

alexpyattaev · 2025-04-17T16:53:51Z

turbine/src/broadcast_stage/broadcast_utils.rs


-    // Wait up to `ENTRY_COALESCE_DURATION` to try to coalesce entries into a 32 shred batch
+    let data_shred_bytes =
+        ShredData::capacity(Some((6, true, false))).expect("Failed to get capacity") as u64;


this is all constant, no need recomputing this every time

alexpyattaev · 2025-04-17T16:56:24Z

turbine/src/broadcast_stage/broadcast_utils.rs

+    ) {
+        // Fetch the next entry.
+        let Ok((try_bank, (entry, tick_height))) = receiver.recv_deadline(
+            coalesce_start + max_coalesce_time(serialized_batch_byte_count, max_batch_byte_count),


suggest we try to maybe wake up some 5 ms early to avoid OS being annoying and waking this thread 5 ms too late instead.

This reverts commit f70c33d.

bw-solana · 2025-04-19T16:20:42Z

Testing w/ latest entry coalescing policy is looking great so far. I'm seeing 5% padding, which is in line with our current padding (maybe slightly less?) w/ variable FEC sets.

Main changes are to:

Wait up to 200ms to coalesce entries, but linearly reduce this amount of time the fuller the current entry batch gets down to a minimum of 50ms (matches current mainnet static limit). This allows us to not just send out tick only highly padded out entry batches at the end of the slot. Given we fill slots in 150-200ms, we were getting a few of these mostly padded out batches per slot. We will also stop coalescing and shred/send if we hit the end of the slot.
If we drain the channel and already exceed the target batch size, keep coalescing entries to get close to the next multiple of erasure batch size.
Reduce max vote batch size to 16 (down from 64). This gives smaller max entry size and makes it easier to tightly pack entry batches. I confirmed we're still packing ~1300 votes per slot.

Picture shows a per slot breakdown of padding bytes and why we decided to exit the entry coalescing routine. Lots of tightly packed is 👍. Exiting due to reaching max size is okay. Exiting due to receive timeout is usually not good:

Also note we appear to be maxing out CUs for the first block in a leader span, but the rest are light (seems to be due to demand)

bw-solana · 2025-04-19T21:10:31Z

Padding bytes relatively small to data bytes:

Broadcast below 350ms:

Replay total elapsed time for a 12 slot sequence (334517276 to 334517287), the middle 4 generated by a leader running this new code. Times are in line with current behavior:

alexpyattaev · 2025-04-20T18:20:23Z

I think hyperoptimizing this is not necessary, as blocks become larger padding will disappear in the overall traffic. We just need to pack more TXs into the blocks in general.

bw-solana force-pushed the fixed_fec_set branch from 21892c7 to 7c32454 Compare April 16, 2025 15:51

alexpyattaev reviewed Apr 17, 2025

View reviewed changes

ledger/src/shred/merkle.rs

Copy link

alexpyattaev Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest moving the shredding logic to another file. merkle.rs is already huge enough.

alexpyattaev reviewed Apr 17, 2025

View reviewed changes

bw-solana and others added 10 commits April 19, 2025 00:32

fixed fec sets

7bd4713

logs and metrics

496fdc6

more stats

9c3aed2

chunking

e7d4a9b

data complete filter

f0e1dac

Revert "data complete filter"

4a4e13b

This reverts commit f70c33d.

remove sigverify shrink

24b37f1

tweaks and metrics

33fe6a4

reduce vote tx batch size

d12ea44

metrics and tweaks

b6d5360

bw-solana force-pushed the fixed_fec_set branch from f52cb83 to b6d5360 Compare April 19, 2025 00:37

Brennan added 2 commits April 19, 2025 01:14

cleanup

7041ad4

shorter wait

470bb9b

bw-solana closed this Jul 2, 2025

fixed fec sets #5771

fixed fec sets #5771

Uh oh!

Conversation

bw-solana commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of Changes

Uh oh!

bw-solana commented Apr 11, 2025

Uh oh!

bw-solana commented Apr 11, 2025

Uh oh!

codecov-commenter commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bw-solana commented Apr 14, 2025

Uh oh!

bw-solana commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bw-solana commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexpyattaev Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

alexpyattaev Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

alexpyattaev Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

bw-solana commented Apr 19, 2025

Uh oh!

bw-solana commented Apr 19, 2025

Uh oh!

alexpyattaev commented Apr 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bw-solana commented Apr 11, 2025 •

edited

Loading

codecov-commenter commented Apr 11, 2025 •

edited

Loading

bw-solana commented Apr 15, 2025 •

edited

Loading

bw-solana commented Apr 15, 2025 •

edited

Loading