core/types: reduce header RLP decode allocations#35030
Conversation
|
I understand this is faster, but does it really matter? Decoding headers is already very fast, we can decode them at 400MB/s even without this change. |
|
Speed wise yes it's already fast, the main benefit here is actually on the allocation side. During skeleton sync each batch pulls 512 headers from a peer, and right now every header needs 3 separate heap allocations - the Header struct, one big.Int for Difficulty, one for Number. So per batch that's around 1500 allocs going through the GC. With the headerAlloc packing we bring that down to ~500 by just putting all three in one struct. Same data, just one trip to the allocator instead of three. And the stream reuse avoids doing 512 pool get/put cycles per batch. Individually small, but during sync this is happening continuously from multiple peers so it adds up in terms of GC pressure. |
17ecac6 to
4c8150a
Compare
This change adds a concrete header decode path and uses it when decoding inbound eth protocol BlockHeaders responses. The new path decodes directly into a types.Header instead of going through the generic RLP reflection decoder, while keeping the existing behavior of rejecting trailing input.
The header decoder is careful about reuse: variable-size fields reuse their backing storage, optional pointer fields are reused when present, and fields that are absent in shorter legacy headers are cleared so stale values cannot survive across decodes.
The BlockHeaders handler still allocates a distinct Header for every decoded item, so response consumers keep the same ownership semantics as before. This avoids unsafe pointer reuse while reducing the overhead of the per-header decode path.