Feature/985 move block finalization to zmq#1014
Open
PropzSaladaz wants to merge 22 commits into
Open
Conversation
…t ports into a struct
…oad & fallback to TCP
Contributor
There was a problem hiding this comment.
Pull request overview
This PR migrates BlockFinalizeDownload from per-request TCP to a persistent ZMQ “bulk-data” request/reply lane, while keeping backward compatibility via TCP fallback.
Changes:
- Added a new bulk-data ZMQ lane plus generic header/payload framing to support request/reply flows with optional binary payloads.
- Refactored BlockFinalize server logic into a transport-neutral responder shared by legacy TCP and the new ZMQ server agent.
- Added unit + E2E tests and test configs to validate ZMQ-first behavior and TCP backward-compat fallbacks.
Reviewed changes
Copilot reviewed 57 out of 57 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/network/ZMQSocketsTests.cpp | Unit test for bulk-data ZMQ port-delta override binding behavior. |
| tests/unit/network/ZMQErrorClassifierTests.cpp | Unit tests for ZMQ transport errno classification and fallback gating. |
| tests/unit/network/NodeTestAccess.h | Test-only accessor for Node internals needed by unit tests. |
| tests/unit/network/NodeConfigTests.cpp | Unit tests for new node config flags/defaults (ZMQ enable + fallback timeout). |
| tests/e2e/E2ETestHelper.h | Switches env var helpers to centralized TestUtils helpers. |
| tests/e2e/ConsensusEngineTestAccess.h | Adds accessors to read BlockFinalize transport stats in E2E tests. |
| tests/e2e/BlockFinalizeTransportCompatTests.cpp | New E2E coverage for ZMQ-first + TCP fallback compatibility scenarios. |
| tests/TestUtils.h | Centralizes env var snapshot/set/restore utilities for tests. |
| test/twonodes_blockfinalize_old_client_new_server/node2/schains/schain1.json | E2E config fixture for old-client/new-server scenario. |
| test/twonodes_blockfinalize_old_client_new_server/node2/Node.json | Enables ZMQ + stats on server-side node for fixture. |
| test/twonodes_blockfinalize_old_client_new_server/node1/schains/schain1.json | E2E config fixture for old-client/new-server scenario. |
| test/twonodes_blockfinalize_old_client_new_server/node1/Node.json | Disables ZMQ on old-client node for fixture. |
| test/twonodes_blockfinalize_new_client_old_server/node2/schains/schain1.json | E2E config fixture for new-client/old-server scenario. |
| test/twonodes_blockfinalize_new_client_old_server/node2/Node.json | Disables ZMQ on old-server node for fixture. |
| test/twonodes_blockfinalize_new_client_old_server/node1/schains/schain1.json | E2E config fixture for new-client/old-server scenario. |
| test/twonodes_blockfinalize_new_client_old_server/node1/Node.json | Enables ZMQ on new-client node for fixture. |
| test/twonodes_blockfinalize_both_new_zmq/node2/schains/schain1.json | E2E config fixture for both-new-ZMQ scenario. |
| test/twonodes_blockfinalize_both_new_zmq/node2/Node.json | Enables ZMQ + stats on node2 for fixture. |
| test/twonodes_blockfinalize_both_new_zmq/node1/schains/schain1.json | E2E config fixture for both-new-ZMQ scenario. |
| test/twonodes_blockfinalize_both_new_zmq/node1/Node.json | Enables ZMQ + stats on node1 for fixture. |
| scripts/tests.py | Adds BlockFinalize transport tests to the test runner; adds env override handling for consensus-basic tests. |
| node/NodeGettersSetters.cpp | Adds getters for new config fields (ZMQ enable + fallback duration). |
| node/Node.h | Adds new config fields, socket port deltas, and teardown changes for ZMQ socket shutdown ordering. |
| node/Node.cpp | Parses new config fields and socket port delta override; updates sockets initialization signature. |
| node/ConsensusEngine.cpp | Moves ZMQ socket shutdown to after worker threads/agents are joined. |
| network/ZMQSockets.h | Adds portType tracking + API to close a single destination socket. |
| network/ZMQSockets.cpp | Uses per-socket-set port delta for connect; adds per-peer destination socket close. |
| network/ZMQHeaderPayloadFrame.h | New helper for ZMQ request/reply framing with JSON header + optional binary payload. |
| network/ZMQHeaderPayloadFrame.cpp | Implements pack/unpack + send/receive helpers and error classification. |
| network/ZMQErrorClassifier.h | Defines transport errno classifier for TCP fallback decisions. |
| network/TCPServerSocket.h | Simplifies constructor to accept bind port directly (no port_type). |
| network/TCPServerSocket.cpp | Aligns constructor with new ServerSocket API; fixes close() to use saved descriptor. |
| network/Sockets.h | Adds bulk-data ZMQ sockets and new init signature accepting port deltas. |
| network/Sockets.cpp | Creates consensus + bulk-data ZMQ sockets and TCP sockets using explicit port deltas. |
| network/SocketPortDeltas.h | New struct for per-lane/per-socket port deltas (configurable for bulk-data lane). |
| network/ServerSocket.h | Simplifies constructor to accept final bind port directly. |
| network/ServerSocket.cpp | Updates bind logging and removes basePort+delta computation. |
| headers/BasicHeader.h | Makes JSON parsing helper APIs accept const refs. |
| headers/BasicHeader.cpp | Implements const-correct JSON parsing helper APIs. |
| exceptions/ZMQTransportException.h | Adds explicit exception type to distinguish transport failures from protocol/state errors. |
| chains/TestConfig.h | Adds a test-only flag to enable BlockFinalize transport stats collection. |
| chains/TestConfig.cpp | Implements stats flag parsing and logging. |
| chains/SchainGettersSetters.cpp | Adds transport stats counters and getters (guarded by test config flag). |
| chains/Schain.h | Defines BlockFinalizeTransportStats and Schain-side counters + APIs. |
| chains/Schain.cpp | Conditionally constructs BlockFinalize ZMQ server agent when enabled. |
| catchup/server/CatchupServerAgent.h | Adds shared BlockFinalizeResponder member; removes TCP-specific finalize builder declaration. |
| catchup/server/CatchupServerAgent.cpp | Routes legacy TCP BlockFinalize handling through BlockFinalizeResponder. |
| blockfinalize/server/BlockFinalizeZmqServerThreadPool.h | New thread-pool wrapper for the ZMQ BlockFinalize server agent. |
| blockfinalize/server/BlockFinalizeZmqServerThreadPool.cpp | Implements the ZMQ server worker thread creation. |
| blockfinalize/server/BlockFinalizeZmqServerAgent.h | New bulk-data-lane ZMQ BlockFinalize server agent. |
| blockfinalize/server/BlockFinalizeZmqServerAgent.cpp | Implements ZMQ receive loop, request parsing, responder delegation, and reply send. |
| blockfinalize/server/BlockFinalizeResponder.h | New transport-neutral BlockFinalize request parsing + response builder API. |
| blockfinalize/server/BlockFinalizeResponder.cpp | Implements shared server-side BlockFinalize semantics and response construction. |
| blockfinalize/client/BlockFinalizeDownloader.h | Adds ZMQ/TCP split download APIs and TCP-fallback caching; adds a response helper struct. |
| blockfinalize/client/BlockFinalizeDownloader.cpp | Implements ZMQ-first fragment download with transport-only fallback to TCP + socket cache management. |
| SkaleCommon.h | Adds default TCP fallback cooldown and introduces new BULK_DATA_ZMQ port_type. |
| CMakeLists.txt | Adds new server sources and new unit/E2E tests to build targets. |
olehnikolaiev
approved these changes
Mar 31, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR moves
BlockFinalizeDownloadoff per-request TCP and onto a new persistent ZMQ bulk-data lane, while preserving backward compatibility with older TCP-only nodes.BlockFinalize Refactor
Refactored BlockFinalize logic into clearer transport boundaries:
BlockFinalizeDownloader.cppBlockFinalizeResponder.cppThis lets both the legacy TCP path and the new ZMQ path share the same BlockFinalize behavior.
Generic ZMQ Header/Payload Framing
Added
ZMQHeaderPayloadFrame, a reusable helper for ZMQ request/reply protocols that use:Wire format:
[uint64 header_len][serialized JSON header][optional payload bytes]This helper is used by the new BlockFinalize ZMQ path. It is generic enough to be reused by future non-hot-path ZMQ request/reply flows, but it is not intended for the latency-sensitive binary consensus message path.
New Bulk-Data ZMQ Lane
Added a separate bulk-data ZMQ lane, distinct from the existing consensus ZMQ lane.
This lane is intended for larger request/reply style recovery or retrieval traffic, such as:
It is not part of the consensus hot path.
New BlockFinalize ZMQ Server
Added
BlockFinalizeZmqServerAgent:ZMQHeaderPayloadFrameformatBlockFinalizeResponderBackward Compatibility
Sender
New nodes are ZMQ-first for BlockFinalize:
This assumes the network will gradually migrate to the new ZMQ path.
Receiver
Backward compatibility for old senders is preserved:
CatchupServerAgentremains activeBLOCK_FINALIZE_REQSo old TCP-only nodes can still request BlockFinalize data from upgraded nodes.
Other improvements
Tests
Tested backward compatibility in 3 cases:
all tests force usage of BlockFinalizeDownload phase
Performance
Fixes #985