-
Notifications
You must be signed in to change notification settings - Fork 346
fix: sync large blocks without OOMing #2636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rach-id
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain how this works? it would make reviewing easier, I'm a bit lost in this PR :D
evan-forbes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice improvement!
do we have an issue for
Add option to skip process proposal, this would give us roughly the same guarantees as state sync and speed up the block sync even more
yet? if not can we write one with the reasoning for doing this with state sync
generally still thinking if there's something we can do that gives us the result we want without having to remember all these parameters and how they interact block by block in the scenario where we are debugging or tuning
only not approving to make sure my understanding is correct and to sleep on solutions
rach-id
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for applying the feedback and adding the summary 🙏 🙏
Left a few nits and a few comments to understand more. Given this is related to blocksync, I'm being very careful with it.
Added a reference to respective issue with regards to FLUPs. As for simplifying the constants, I tried doing that, but I didn't come up with a cleaner approach. You can see my calculations in the description to the PR. But basically because we calculate some value, we need to know min and max and then interpolate between those border values (we do this for both timeout and pending requests). We want to also have default values, when we didn't load enough blocks (edge case) etc. So maybe it is possible to get rid of a couple constants, but we will still have a lot due to the number of variables in our calculations. |
tzdybal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is huge and complex. The logic is very sophisticated.
I can see some changes that would deserve separate PRs, mostly for clarity:
- Earlier deserialization that avoids copying large chunks of data.
- Adding tracing.
I spend a lot of time reviewing this. The changes are not obvious, and it's a bit hard to reason about them. This raises my concern, that debugging any issues related to this code could also be hard. Claude code suggested that there might be some data races in the new code, but I'm not sure if they are just false-positives from AI review (but it's concerning anyways).
There is also a "memory leak" (accumulation of objects in map) - details in comment.
I don't fully understand (and there are no comments about) some code removals, like new height notification / second peer handling.
Proposed solution tries to be very comprehensive and data driven, but in my opinion some assumptions are not ideal. For example, next blocks can always be of max size, and we need to be able to handle this. Similarly, dynamic timeouts is a great idea, but without knowing the exact size of a block we need to prepare for the worst (which is IMHO max block size, not max size observed recently).
I'm not a fan of dropping requesters logic. If we're going to modify the already convoluted logic in blocksync/pool.go, I would prefer something like waiting before sendRequest (sync.Cond is first that comes to my mind) if there is no memory to receive next blocks.
blocksync/block_stats_test.go
Outdated
| t.Errorf("Expected average %f, got %f", expected, rb.GetAverage()) | ||
| } | ||
|
|
||
| // Verify size stays at capacity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actual check for capacity would also be useful.
blocksync/params.go
Outdated
| // Use max block size for calculations (worst-case planning) | ||
| p.maxPendingPerPeer = p.calculateMaxPending(p.maxBlockSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we're using max observed block size? IMHO this is not a worst case. Using max allowed block size is actually the worst case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using max observed block will be more conservative and allow us to have larger timeouts for the future blocks, otherwise if we get some blocks at 0 and some blocks 128 we will have timeouts that would be to low for large blocks. We can also use median for that. But still, I don't want to complicate this right now
blocksync/params_test.go
Outdated
| // almostEqualRelative checks if two float64 values are approximately equal | ||
| // within a relative epsilon tolerance. Returns true if the absolute difference | ||
| // is less than or equal to epsilon times the larger of the two values. | ||
| func almostEqualRelative(a, b, epsilon float64) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a InEpsilon function in testify for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this also
blocksync/pool.go
Outdated
| // request, we disconnect. | ||
| // Check if this was from a dropped requester, due to max size of the block increase | ||
| if _, wasDropped := pool.droppedRequesters[block.Height]; wasDropped { | ||
| delete(pool.droppedRequesters, block.Height) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place we delete from droppedRequesters. If we won't get those "unwanted" blocks from them, we leak / accumulate droppedRequesters forever (I know it's just struct{} but it's still a leak).
|
Some comments lost while submitting 😭 For some reason even typing lags for me on this PR... |
rach-id
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final round of review.
Thanks for applying the previous feedback. I agree with the direction of the PR and with the proposed solution, given how hard it is to estimate how many requesters to use without knowing the block sizes that will be requested.
The feedback I left is implementation related to make it easy for future us to debug in case of issues.
Also, thanks for your patience with the multiple feedback rounds 🙏 🙏
evan-forbes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice simplification!!
I appreciate all the refactors here
|
I think we can dismiss the block @tzdybal is that correct? |
Overview
This PR introduces dynamic parameter calculation for the block sync reactor based on observed block sizes. The system adapts retry timeouts, pending request limits, and memory usage in real-time as blocks are synced, optimizing for both small and large block sizes.
Changes to unmarshalling and copying (for all reactors)
Unmarshalling of message protos now happens in the same goroutine, only then we send the result asynchronously to respective channel, before we copied the bytes in this goroutine and sent raw bytes to a reactor. This prevents us from allocating an intermediate buffer for each message and doing twice the work. This reduces the memory usage during blocksync.
Core Components
1. BlockStats (
block_stats.go)A circular buffer that maintains last 70 received block sizes.
When a new block size is added:
2. Block Pool Parameters (
pool.go)Parameters that determine the buffer sizes and timeouts.
reqLimit: Maximum concurrent requests per peerretryTimeout: Duration before retrying a failed block requestAll calculations use the maximum block size from the block stats (worst-case planning) rather than the average. This ensures conservative resource allocation.
a. Concurrent requests to single peer
Determines how many concurrent block requests can be sent to a single peer. In current p2p scheme the responses from peer are received sequentially, so there is no multiplexing at the reactor level (though we can get messages from different reactors in parallel). That means that even if we request a lot of blocks in parallel, we will still get them sequentially. So logically the number of concurrent requests to peer should be higher for small blocks because peers send them fast, but lower for large blocks.
b. Retry timeouts
Determines how long to wait before retrying a timed-out block request. Again, should be smaller for small blocks, and larger for large ones.
3. Parameter Recalculation
Parameters are recalculated in three scenarios:
After each block is received:
4. Block Request Flow
Each block height has a dedicated
bpRequesterthat manages the request lifecycle:Initial request: Sent to the peer returned by
pickIncrAvailablePeer()Timeout handling: If no block arrives within
retryTimeout:redo()on itselfBlock arrival: When
AddBlockis called:FLUPs
Reduce large peer buffers after the end of the block sync to free memory
Save blocks in another goroutine, to speed up the block sync
Add option to skip process proposal, this would give us roughly the same guarantees as state sync and speed up the block sync even more
Some more context is referenced here in the issue: #2594