Skip to content

Feature/985 move block finalization to zmq#1014

Open
PropzSaladaz wants to merge 22 commits into
bite-2from
feature/985-move-block-finalization-to-zmq
Open

Feature/985 move block finalization to zmq#1014
PropzSaladaz wants to merge 22 commits into
bite-2from
feature/985-move-block-finalization-to-zmq

Conversation

@PropzSaladaz
Copy link
Copy Markdown
Collaborator

@PropzSaladaz PropzSaladaz commented Mar 18, 2026

Description

This PR moves BlockFinalizeDownload off per-request TCP and onto a new persistent ZMQ bulk-data lane, while preserving backward compatibility with older TCP-only nodes.

BlockFinalize Refactor

Refactored BlockFinalize logic into clearer transport boundaries:

  • BlockFinalizeDownloader.cpp

    • owns the client-side BlockFinalize fetch logic
    • decides whether to use ZMQ or TCP
    • performs per-peer TCP fallback caching
    • processes returned fragments / DA sig / BITE shares
  • BlockFinalizeResponder.cpp

    • owns the transport-neutral server-side BlockFinalize semantics
    • validates requests
    • finds proposal / committed block data
    • builds the BlockFinalize response payload and response header fields

This lets both the legacy TCP path and the new ZMQ path share the same BlockFinalize behavior.

Generic ZMQ Header/Payload Framing

Added ZMQHeaderPayloadFrame, a reusable helper for ZMQ request/reply protocols that use:

  • a small structured JSON header
  • an optional binary payload

Wire format:

[uint64 header_len][serialized JSON header][optional payload bytes]

This helper is used by the new BlockFinalize ZMQ path. It is generic enough to be reused by future non-hot-path ZMQ request/reply flows, but it is not intended for the latency-sensitive binary consensus message path.

New Bulk-Data ZMQ Lane

Added a separate bulk-data ZMQ lane, distinct from the existing consensus ZMQ lane.

This lane is intended for larger request/reply style recovery or retrieval traffic, such as:

  • BlockFinalize
  • potentially catchup or similar flows in future PRs

It is not part of the consensus hot path.

New BlockFinalize ZMQ Server

Added BlockFinalizeZmqServerAgent:

  • listens on the new bulk-data ZMQ lane
  • accepts BlockFinalize requests in ZMQHeaderPayloadFrame format
  • delegates response construction to BlockFinalizeResponder

Backward Compatibility

Sender

New nodes are ZMQ-first for BlockFinalize:

  • try ZMQ first
  • on transport/protocol failure, close the cached ZMQ socket for that peer
  • mark that peer for temporary TCP fallback
  • retry immediately over legacy TCP
  • continue using TCP for that peer until the fallback cooldown expires
  • after cooldown, probe ZMQ again

This assumes the network will gradually migrate to the new ZMQ path.

Receiver

Backward compatibility for old senders is preserved:

  • CatchupServerAgent remains active
  • it still serves legacy TCP BLOCK_FINALIZE_REQ
  • it also still serves normal catchup traffic as before

So old TCP-only nodes can still request BlockFinalize data from upgraded nodes.


Other improvements

  • ZMQ socket shutdown was moved to after all threads & agents are joined. Not before.
  • Tests were failing unpredictably since we closed ZMQ sockets before joining threads - meaning threads could be trying to read / write to/from ZMQ sockets even after being closed.

Tests

  • Tested backward compatibility in 3 cases:

    • 1 node with TCP server, 1 node with ZMQ client enabled
    • 1 node with ZMQ server enabled, 1 node with TCP client
    • 2 nodes with ZMQ server / client
  • all tests force usage of BlockFinalizeDownload phase


Performance

  • No visible speedup testing locally. This code path - BlockFinalizeDownloader is not required, i.e, is optional in a fast enough network and usually not used. But when it is used, then network delay should drop to half time since there's no longer the extra RTT for TCP's 3-way handshake

Fixes #985

@PropzSaladaz PropzSaladaz self-assigned this Mar 18, 2026
@PropzSaladaz PropzSaladaz requested a review from kladkogex as a code owner March 18, 2026 20:18
@PolinaKiporenko PolinaKiporenko linked an issue Mar 23, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates BlockFinalizeDownload from per-request TCP to a persistent ZMQ “bulk-data” request/reply lane, while keeping backward compatibility via TCP fallback.

Changes:

  • Added a new bulk-data ZMQ lane plus generic header/payload framing to support request/reply flows with optional binary payloads.
  • Refactored BlockFinalize server logic into a transport-neutral responder shared by legacy TCP and the new ZMQ server agent.
  • Added unit + E2E tests and test configs to validate ZMQ-first behavior and TCP backward-compat fallbacks.

Reviewed changes

Copilot reviewed 57 out of 57 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/unit/network/ZMQSocketsTests.cpp Unit test for bulk-data ZMQ port-delta override binding behavior.
tests/unit/network/ZMQErrorClassifierTests.cpp Unit tests for ZMQ transport errno classification and fallback gating.
tests/unit/network/NodeTestAccess.h Test-only accessor for Node internals needed by unit tests.
tests/unit/network/NodeConfigTests.cpp Unit tests for new node config flags/defaults (ZMQ enable + fallback timeout).
tests/e2e/E2ETestHelper.h Switches env var helpers to centralized TestUtils helpers.
tests/e2e/ConsensusEngineTestAccess.h Adds accessors to read BlockFinalize transport stats in E2E tests.
tests/e2e/BlockFinalizeTransportCompatTests.cpp New E2E coverage for ZMQ-first + TCP fallback compatibility scenarios.
tests/TestUtils.h Centralizes env var snapshot/set/restore utilities for tests.
test/twonodes_blockfinalize_old_client_new_server/node2/schains/schain1.json E2E config fixture for old-client/new-server scenario.
test/twonodes_blockfinalize_old_client_new_server/node2/Node.json Enables ZMQ + stats on server-side node for fixture.
test/twonodes_blockfinalize_old_client_new_server/node1/schains/schain1.json E2E config fixture for old-client/new-server scenario.
test/twonodes_blockfinalize_old_client_new_server/node1/Node.json Disables ZMQ on old-client node for fixture.
test/twonodes_blockfinalize_new_client_old_server/node2/schains/schain1.json E2E config fixture for new-client/old-server scenario.
test/twonodes_blockfinalize_new_client_old_server/node2/Node.json Disables ZMQ on old-server node for fixture.
test/twonodes_blockfinalize_new_client_old_server/node1/schains/schain1.json E2E config fixture for new-client/old-server scenario.
test/twonodes_blockfinalize_new_client_old_server/node1/Node.json Enables ZMQ on new-client node for fixture.
test/twonodes_blockfinalize_both_new_zmq/node2/schains/schain1.json E2E config fixture for both-new-ZMQ scenario.
test/twonodes_blockfinalize_both_new_zmq/node2/Node.json Enables ZMQ + stats on node2 for fixture.
test/twonodes_blockfinalize_both_new_zmq/node1/schains/schain1.json E2E config fixture for both-new-ZMQ scenario.
test/twonodes_blockfinalize_both_new_zmq/node1/Node.json Enables ZMQ + stats on node1 for fixture.
scripts/tests.py Adds BlockFinalize transport tests to the test runner; adds env override handling for consensus-basic tests.
node/NodeGettersSetters.cpp Adds getters for new config fields (ZMQ enable + fallback duration).
node/Node.h Adds new config fields, socket port deltas, and teardown changes for ZMQ socket shutdown ordering.
node/Node.cpp Parses new config fields and socket port delta override; updates sockets initialization signature.
node/ConsensusEngine.cpp Moves ZMQ socket shutdown to after worker threads/agents are joined.
network/ZMQSockets.h Adds portType tracking + API to close a single destination socket.
network/ZMQSockets.cpp Uses per-socket-set port delta for connect; adds per-peer destination socket close.
network/ZMQHeaderPayloadFrame.h New helper for ZMQ request/reply framing with JSON header + optional binary payload.
network/ZMQHeaderPayloadFrame.cpp Implements pack/unpack + send/receive helpers and error classification.
network/ZMQErrorClassifier.h Defines transport errno classifier for TCP fallback decisions.
network/TCPServerSocket.h Simplifies constructor to accept bind port directly (no port_type).
network/TCPServerSocket.cpp Aligns constructor with new ServerSocket API; fixes close() to use saved descriptor.
network/Sockets.h Adds bulk-data ZMQ sockets and new init signature accepting port deltas.
network/Sockets.cpp Creates consensus + bulk-data ZMQ sockets and TCP sockets using explicit port deltas.
network/SocketPortDeltas.h New struct for per-lane/per-socket port deltas (configurable for bulk-data lane).
network/ServerSocket.h Simplifies constructor to accept final bind port directly.
network/ServerSocket.cpp Updates bind logging and removes basePort+delta computation.
headers/BasicHeader.h Makes JSON parsing helper APIs accept const refs.
headers/BasicHeader.cpp Implements const-correct JSON parsing helper APIs.
exceptions/ZMQTransportException.h Adds explicit exception type to distinguish transport failures from protocol/state errors.
chains/TestConfig.h Adds a test-only flag to enable BlockFinalize transport stats collection.
chains/TestConfig.cpp Implements stats flag parsing and logging.
chains/SchainGettersSetters.cpp Adds transport stats counters and getters (guarded by test config flag).
chains/Schain.h Defines BlockFinalizeTransportStats and Schain-side counters + APIs.
chains/Schain.cpp Conditionally constructs BlockFinalize ZMQ server agent when enabled.
catchup/server/CatchupServerAgent.h Adds shared BlockFinalizeResponder member; removes TCP-specific finalize builder declaration.
catchup/server/CatchupServerAgent.cpp Routes legacy TCP BlockFinalize handling through BlockFinalizeResponder.
blockfinalize/server/BlockFinalizeZmqServerThreadPool.h New thread-pool wrapper for the ZMQ BlockFinalize server agent.
blockfinalize/server/BlockFinalizeZmqServerThreadPool.cpp Implements the ZMQ server worker thread creation.
blockfinalize/server/BlockFinalizeZmqServerAgent.h New bulk-data-lane ZMQ BlockFinalize server agent.
blockfinalize/server/BlockFinalizeZmqServerAgent.cpp Implements ZMQ receive loop, request parsing, responder delegation, and reply send.
blockfinalize/server/BlockFinalizeResponder.h New transport-neutral BlockFinalize request parsing + response builder API.
blockfinalize/server/BlockFinalizeResponder.cpp Implements shared server-side BlockFinalize semantics and response construction.
blockfinalize/client/BlockFinalizeDownloader.h Adds ZMQ/TCP split download APIs and TCP-fallback caching; adds a response helper struct.
blockfinalize/client/BlockFinalizeDownloader.cpp Implements ZMQ-first fragment download with transport-only fallback to TCP + socket cache management.
SkaleCommon.h Adds default TCP fallback cooldown and introduces new BULK_DATA_ZMQ port_type.
CMakeLists.txt Adds new server sources and new unit/E2E tests to build targets.

Comment thread blockfinalize/client/BlockFinalizeDownloader.h Outdated
Comment thread network/Sockets.h Outdated
Comment thread blockfinalize/client/BlockFinalizeDownloader.cpp Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

use ZMQ for block finalization

3 participants