fix: sync large blocks without OOMing #2636

mcrakhman · 2025-11-10T14:39:35Z

Overview

This PR introduces dynamic parameter calculation for the block sync reactor based on observed block sizes. The system adapts retry timeouts, pending request limits, and memory usage in real-time as blocks are synced, optimizing for both small and large block sizes.

Changes to unmarshalling and copying (for all reactors)

Unmarshalling of message protos now happens in the same goroutine, only then we send the result asynchronously to respective channel, before we copied the bytes in this goroutine and sent raw bytes to a reactor. This prevents us from allocating an intermediate buffer for each message and doing twice the work. This reduces the memory usage during blocksync.

Core Components

1. BlockStats (`block_stats.go`)

A circular buffer that maintains last 70 received block sizes.
When a new block size is added:

If buffer is not full: adds the value and updates max
If buffer is full: replaces the oldest value, updates sum, and recalculates max if necessary

2. Block Pool Parameters (`pool.go`)

Parameters that determine the buffer sizes and timeouts.

reqLimit: Maximum concurrent requests per peer
retryTimeout: Duration before retrying a failed block request

All calculations use the maximum block size from the block stats (worst-case planning) rather than the average. This ensures conservative resource allocation.

a. Concurrent requests to single peer

Determines how many concurrent block requests can be sent to a single peer. In current p2p scheme the responses from peer are received sequentially, so there is no multiplexing at the reactor level (though we can get messages from different reactors in parallel). That means that even if we request a lot of blocks in parallel, we will still get them sequentially. So logically the number of concurrent requests to peer should be higher for small blocks because peers send them fast, but lower for large blocks.

b. Retry timeouts

Determines how long to wait before retrying a timed-out block request. Again, should be smaller for small blocks, and larger for large ones.

// setting some value between maxReqLimit and minReqLimit to limit the number of requests we send to one 
// this is done to maintain peer diversity
pool.reqLimit = interpolate(blockSize, minBlockSizeBytes, maxBlockSizeBytes, maxReqLimit, minReqLimit)
// in the same way finding an adequate timeout given the block size
pool.retryTimeout = time.Second * time.Duration(interpolate(blockSize, minBlockSizeBytes, maxBlockSizeBytes, minRetrySeconds, maxRetrySeconds))

3. Parameter Recalculation

Parameters are recalculated in three scenarios:

After each block is received:

Block size is added to the rotating buffer
All parameters recalculate based on the new max block size

4. Block Request Flow

Each block height has a dedicated bpRequester that manages the request lifecycle:

Initial request: Sent to the peer returned by pickIncrAvailablePeer()
- Picks peers with lowest current receive rate (most available)
- Increments peer's pending counter
Timeout handling: If no block arrives within retryTimeout:
- Requester calls redo() on itself
- Resets internal state but keeps the same height
- Sends a new request to a potentially different peer
- Can retry multiple times until the block arrives
Block arrival: When AddBlock is called:
- Validates the block came from the expected peer
- Stores block and extended commit in the requester
- Decrements peer's pending counter
- Adds block size to the rotating buffer (triggers recalculation)

FLUPs

Reduce large peer buffers after the end of the block sync to free memory
Save blocks in another goroutine, to speed up the block sync
Add option to skip process proposal, this would give us roughly the same guarantees as state sync and speed up the block sync even more

Some more context is referenced here in the issue: #2594

rach-id

Can you please explain how this works? it would make reviewing easier, I'm a bit lost in this PR :D

p2p/base_reactor.go

blocksync/block_stats.go

blocksync/params.go

blocksync/pool.go

evan-forbes

nice improvement!

do we have an issue for

Add option to skip process proposal, this would give us roughly the same guarantees as state sync and speed up the block sync even more

yet? if not can we write one with the reasoning for doing this with state sync

generally still thinking if there's something we can do that gives us the result we want without having to remember all these parameters and how they interact block by block in the scenario where we are debugging or tuning

only not approving to make sure my understanding is correct and to sleep on solutions

blocksync/pool.go

rach-id

Thanks for applying the feedback and adding the summary 🙏 🙏

Left a few nits and a few comments to understand more. Given this is related to blocksync, I'm being very careful with it.

blocksync/pool.go

p2p/base_reactor.go

mcrakhman · 2025-11-19T16:43:41Z

nice improvement!

do we have an issue for

Add option to skip process proposal, this would give us roughly the same guarantees as state sync and speed up the block sync even more

yet? if not can we write one with the reasoning for doing this with state sync

generally still thinking if there's something we can do that gives us the result we want without having to remember all these parameters and how they interact block by block in the scenario where we are debugging or tuning

only not approving to make sure my understanding is correct and to sleep on solutions

Added a reference to respective issue with regards to FLUPs. As for simplifying the constants, I tried doing that, but I didn't come up with a cleaner approach. You can see my calculations in the description to the PR. But basically because we calculate some value, we need to know min and max and then interpolate between those border values (we do this for both timeout and pending requests). We want to also have default values, when we didn't load enough blocks (edge case) etc. So maybe it is possible to get rid of a couple constants, but we will still have a lot due to the number of variables in our calculations.

tzdybal

This PR is huge and complex. The logic is very sophisticated.

I can see some changes that would deserve separate PRs, mostly for clarity:

Earlier deserialization that avoids copying large chunks of data.
Adding tracing.

I spend a lot of time reviewing this. The changes are not obvious, and it's a bit hard to reason about them. This raises my concern, that debugging any issues related to this code could also be hard. Claude code suggested that there might be some data races in the new code, but I'm not sure if they are just false-positives from AI review (but it's concerning anyways).
There is also a "memory leak" (accumulation of objects in map) - details in comment.

I don't fully understand (and there are no comments about) some code removals, like new height notification / second peer handling.

Proposed solution tries to be very comprehensive and data driven, but in my opinion some assumptions are not ideal. For example, next blocks can always be of max size, and we need to be able to handle this. Similarly, dynamic timeouts is a great idea, but without knowing the exact size of a block we need to prepare for the worst (which is IMHO max block size, not max size observed recently).

I'm not a fan of dropping requesters logic. If we're going to modify the already convoluted logic in blocksync/pool.go, I would prefer something like waiting before sendRequest (sync.Cond is first that comes to my mind) if there is no memory to receive next blocks.

tzdybal · 2025-11-19T09:29:28Z

blocksync/block_stats_test.go

+		t.Errorf("Expected average %f, got %f", expected, rb.GetAverage())
+	}
+
+	// Verify size stays at capacity


Actual check for capacity would also be useful.

tzdybal · 2025-11-19T09:36:40Z

blocksync/params.go

+		// Use max block size for calculations (worst-case planning)
+		p.maxPendingPerPeer = p.calculateMaxPending(p.maxBlockSize)


Why we're using max observed block size? IMHO this is not a worst case. Using max allowed block size is actually the worst case.

Using max observed block will be more conservative and allow us to have larger timeouts for the future blocks, otherwise if we get some blocks at 0 and some blocks 128 we will have timeouts that would be to low for large blocks. We can also use median for that. But still, I don't want to complicate this right now

tzdybal · 2025-11-19T09:50:39Z

blocksync/params_test.go

+// almostEqualRelative checks if two float64 values are approximately equal
+// within a relative epsilon tolerance. Returns true if the absolute difference
+// is less than or equal to epsilon times the larger of the two values.
+func almostEqualRelative(a, b, epsilon float64) bool {


There is a InEpsilon function in testify for this.

Removed this also

tzdybal · 2025-11-19T20:27:50Z

blocksync/pool.go

-		// request, we disconnect.
+		// Check if this was from a dropped requester, due to max size of the block increase
+		if _, wasDropped := pool.droppedRequesters[block.Height]; wasDropped {
+			delete(pool.droppedRequesters, block.Height)


This is the only place we delete from droppedRequesters. If we won't get those "unwanted" blocks from them, we leak / accumulate droppedRequesters forever (I know it's just struct{} but it's still a leak).

blocksync/params.go

blocksync/block_stats.go

blocksync/block_stats_test.go

tzdybal · 2025-11-19T21:36:58Z

Some comments lost while submitting 😭 For some reason even typing lags for me on this PR...
Fortunately the main description is there at least...

rach-id

Final round of review.

Thanks for applying the previous feedback. I agree with the direction of the PR and with the proposed solution, given how hard it is to estimate how many requesters to use without knowing the block sizes that will be requested.

The feedback I left is implementation related to make it easy for future us to debug in case of issues.

Also, thanks for your patience with the multiple feedback rounds 🙏 🙏

blocksync/block_stats.go

blocksync/params.go

blocksync/pool.go

…k-sync-traces

evan-forbes

nice simplification!!

I appreciate all the refactors here

blocksync/pool_test.go

blocksync/pool.go

evan-forbes · 2025-11-21T16:33:34Z

I think we can dismiss the block @tzdybal is that correct?

Out of date

mcrakhman added 26 commits October 22, 2025 19:38

feat:add traces to blocksync

be37456

Merge branch 'main' into mcrakhman/block-sync-traces

baa5a51

feat: implement changes to blocksync

fa769df

fix: peer eviction after timer reset

9f0ece9

debug: add traces for block requests

9a84413

debug: increase peer diversity

606d402

feat: alternate peers

590e6ad

feat: change blocksync logic

dc8f92e

debug: remove debug prints

48e74e3

debug: add correct peer logging

bbfa9dc

Merge branch 'main' into mcrakhman/block-sync-traces

fefcd76

debug: check previous pool settings

d9fb221

feat: dynamic pool size

be7f3d7

debug: remove info logs on pool size changes

ae326c2

revert: remove info logs on pool size changes

836d73a

debug: change block limits for larger blocks

e53ac79

debug: add number of requesters to log

126ca04

feat: reduce requesters when blocks are large

97a2bff

refactor: pool params calculation

08c841d

feat: use max block size instead of average

080d21c

fix: dropped requesters

fe4291a

debug: log for dropped requester

6cd7943

refactor: more tests and comments

b05d5a3

refactor: num pending

f51d2f1

refactor: simplify active peers

8b0abc0

feat: add more traces for validate and save time to blocksync

a9d117a

mcrakhman mentioned this pull request Nov 17, 2025

Proposal: Make the block-pool concurrency memory-aware #2594

Open

rootulp assigned mcrakhman Nov 17, 2025

rootulp added the 128mb/6s label Nov 17, 2025

rach-id reviewed Nov 18, 2025

View reviewed changes

mcrakhman requested a review from ninabarbakadze as a code owner November 18, 2025 21:38

evan-forbes added this to core/app Nov 18, 2025

github-project-automation bot moved this to Needs Triage in core/app Nov 18, 2025

evan-forbes reviewed Nov 19, 2025

View reviewed changes

blocksync/pool.go Outdated Show resolved Hide resolved

blocksync/pool.go Outdated Show resolved Hide resolved

rach-id reviewed Nov 19, 2025

View reviewed changes

mcrakhman added 3 commits November 19, 2025 15:39

fix: review comments

5be511c

fix: race in test

e78d705

lint: remove unused function

2f15854

Merge branch 'main' into mcrakhman/block-sync-traces

f12bbd7

tzdybal previously requested changes Nov 19, 2025

View reviewed changes

rach-id reviewed Nov 20, 2025

View reviewed changes

mcrakhman added 8 commits November 20, 2025 13:20

feat: don't drop requesters until mem limit is reached

02f924d

debug: limit requesters to some fixed value

8401cf8

debug: limit max pending per peer - hard cap

9c8ca2a

refactor: simplify params

72d7cfb

refactor: fix and simplify

567e3d4

refactor: add more comments

0ea933c

Merge branch 'mcrakhman/blocksync-hard-limit-cap' into mcrakhman/bloc…

f31a42c

…k-sync-traces

refactor: block stats

19d1101

evan-forbes approved these changes Nov 21, 2025

View reviewed changes

blocksync/pool_test.go Outdated Show resolved Hide resolved

blocksync/pool.go Show resolved Hide resolved

blocksync/pool.go Show resolved Hide resolved

blocksync/pool.go Show resolved Hide resolved

mcrakhman added 2 commits November 21, 2025 11:54

fix: reset timeout on getting already committed block

f59580f

refactor: review comments

b00f0fc

rach-id approved these changes Nov 21, 2025

View reviewed changes

evan-forbes merged commit 2266710 into main Nov 21, 2025
25 of 26 checks passed

evan-forbes deleted the mcrakhman/block-sync-traces branch November 21, 2025 20:35

github-project-automation bot moved this from Needs Triage to Done in core/app Nov 21, 2025

		// Use max block size for calculations (worst-case planning)
		p.maxPendingPerPeer = p.calculateMaxPending(p.maxBlockSize)

fix: sync large blocks without OOMing #2636

fix: sync large blocks without OOMing #2636

Uh oh!

Conversation

mcrakhman commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes to unmarshalling and copying (for all reactors)

Core Components

1. BlockStats (block_stats.go)

2. Block Pool Parameters (pool.go)

a. Concurrent requests to single peer

b. Retry timeouts

3. Parameter Recalculation

4. Block Request Flow

FLUPs

Uh oh!

rach-id left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

evan-forbes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rach-id left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mcrakhman commented Nov 19, 2025

Uh oh!

tzdybal left a comment

Choose a reason for hiding this comment

Uh oh!

tzdybal Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

tzdybal Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

mcrakhman Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

tzdybal Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

mcrakhman Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

tzdybal Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tzdybal commented Nov 19, 2025

Uh oh!

rach-id left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mcrakhman commented Nov 10, 2025 •

edited

Loading

1. BlockStats (`block_stats.go`)

2. Block Pool Parameters (`pool.go`)