p2p/sentry, node/eth: EIP-8159 eth/71 BAL fetcher + background downloader (PR 3/3)#20795
Open
p2p/sentry, node/eth: EIP-8159 eth/71 BAL fetcher + background downloader (PR 3/3)#20795
Conversation
fddf9a1 to
3e7bb0e
Compare
…se 5a) Subscribes to GET_BLOCK_ACCESS_LISTS_71 and routes it to a new handler that answers with BlockAccessLists sourced from rawdb via the handler added in Phase 3. After this commit, two erigon nodes running with the eth/71-aware stack can complete the request/response round trip at the wire level: node A sends GetBlockAccessLists; node B decodes, looks up the BALs, and replies with BlockAccessLists positionally aligned. Changes in p2p/sentry/sentry_multi_client/sentry_multi_client.go: - RecvUploadMessageLoop subscribes to the new request MessageId (GET_BLOCK_ACCESS_LISTS_71) alongside the existing GetBlockBodies / GetReceipts subscriptions. - New method getBlockAccessLists71 mirrors getBlockBodies66: decode the eth/66 request-id envelope, open a read-only tx, call the Phase 3 handler (eth.AnswerGetBlockAccessListsQuery), encode the reply as BlockAccessListsPacket66 with the matching request id, and send via sentry.SendMessageById to BLOCK_ACCESS_LISTS_71. - Placeholder blockAccessLists71 (no-op) is wired for inbound responses so the sentry routing table doesn't error. The full response path — request-id matching, keccak256 validation against the header's BlockAccessListHash, bad-peer scoring, and writing to rawdb — lives in the client fetcher landing next (Phase 5b). - handleInboundMessage switch now routes both new MessageIds. Tested: short tests pass in p2p/sentry, p2p/sentry/libsentry, and p2p/sentry/sentry_multi_client; make lint clean; make erigon builds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ase 5b.1)
Client-side fetcher for the eth/71 GetBlockAccessLists / BlockAccessLists
round trip, with header-hash validation and immediate bad-peer penalty on
mismatch. This is the primitive the next commit wires behind a debug RPC
method (for running a pair of bal-devnet-3 nodes against each other), and
on top of which the eventual stage-integrated fetcher will build.
New file: p2p/sentry/sentry_multi_client/bal_fetcher.go
- BALFetcher: request-id → waiting-goroutine map, mutex-protected.
- NewBALFetcher(): constructor, shared across sentries by MultiClient.
- FetchBlockAccessLists(ctx, sentry, peerID, blockHashes, expectedHashes):
encodes a GetBlockAccessListsPacket66 with a random request id, sends it
via Sentry.SendMessageById to the given peer, blocks until the matching
BlockAccessLists response arrives (or ctx cancel / 30s default timeout),
then validates each payload.
- Validation semantics:
* empty payload (0xc0) + expected == empty.BlockAccessListHash → accepted
as a genuinely empty BAL, returned as 0xc0.
* empty payload + expected != empty hash → "peer does not have it",
returned as nil so the caller can retry from another peer. NOT a
bad-peer signal.
* non-empty payload with keccak256(payload) != expected → peer sent
garbage. Sentry.PenalizePeer(Kick) immediately, return ErrBadBALResponse.
* length > requested → peer misbehaves, same Kick treatment.
- Deliver(peerID, packet): called from the inbound message handler. Matches
by RequestId and requires the peer id to match the one we asked (dropping
responses from impostors silently). Non-blocking send to a buffered(1)
channel so duplicates or late arrivals never leak goroutines.
Wire-up in sentry_multi_client.go:
- MultiClient gains a balFetcher field, constructed by NewMultiClient.
- blockAccessLists71 inbound handler decodes the eth/66 envelope and calls
cs.balFetcher.Deliver(peerID, &packet). Unknown / wrong-peer / stale
arrivals are silently dropped (not bad-peer signals by themselves).
- MultiClient.FetchBlockAccessLists(ctx, peerID, blockHashes, expectedHashes)
selects the first ready sentry and delegates to BALFetcher. Returns an
error if no sentry is ready — matches the SendBodyRequest pattern.
Tests in p2p/sentry/sentry_multi_client/bal_fetcher_test.go — fake sentry
records SendMessageById / PenalizePeer calls and can deliver responses:
- ValidPopulatedResponse: populated BAL, keccak matches expected → accepted,
no penalty.
- EmptyBALAcceptedOnlyWhenExpected: two-slot response where expected[0] is
empty.BlockAccessListHash and expected[1] is some arbitrary hash; the
peer returns 0xc0 for both. Slot 0 accepted as 0xc0, slot 1 returned as
nil. No penalty in either case.
- HashMismatchPenalisesPeer: non-empty payload with wrong hash → returns
ErrBadBALResponse and records a single PenaltyKind_Kick.
- DeliverIgnoresUnknownRequestID: Deliver with no matching in-flight entry
returns false.
- DeliverIgnoresWrongPeer: impostor deliver from a different peer is
rejected; target peer's later deliver succeeds.
Not in this commit:
- Withholding detection (peers that consistently return 0xc0 for BAL hashes
known to be non-empty). That needs a rolling window and peer-score table
which are natural to add alongside the stage integration in a later
commit.
- A debug RPC method to expose FetchBlockAccessLists for devnet testing.
Adding next (Phase 5b.2).
- Stage integration / background sync. Phase 5c.
Tested: make lint clean, make erigon builds, all 5 new tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Always-on background loop that fills rawdb with Block Access Lists for recent blocks whose header commits to a BAL hash but whose BAL is not yet stored locally. Rollout model matches eth/70: no feature flag, behaviour is gated purely by per-peer capability negotiation. If no connected peer advertises eth/71 every scan pass is a silent no-op; once any peer negotiates eth/71 missing BALs start flowing in. The block executor already regenerates and validates BALs locally via ProcessBAL, so a missing p2p-delivered BAL is never a correctness issue — only a CPU-cost optimisation. The downloader therefore runs strictly in the background and never blocks stage progress; a failed or missing fetch is retried on the next pass (default every 10s over a 256-block window from head). New file: p2p/sentry/sentry_multi_client/bal_downloader.go - BALDownloader: holds MultiClient ref for peer picking + issuing GetBlockAccessLists, plus a writable kv.RwDB for persisting BALs (MultiClient's own db field is read-only by design). - Run(ctx): 15s initial delay so sentries have time to negotiate, then a ticker loop that invokes scanAndFetch until ctx cancel. - scanAndFetch: pickEth71Peer → collectMissingBALs → fetch in batches of 32 hashes with max-4 parallelism via a semaphore. - collectMissingBALs: walks head..head-scanDepth, returns entries whose hdr.BlockAccessListHash is non-nil and whose BAL isn't in rawdb. Stops walking once it hits a pre-Amsterdam header. - fetchBatch: calls MultiClient.FetchBlockAccessLists (which handles validation + bad-peer penalty via the BALFetcher from 5b.1). Accepted entries are written via rawdb.WriteBlockAccessListBytes. "Not available" slots (nil in response) are silently skipped for retry next pass. - pickEth71Peer: iterates all sentries, calls Peers() RPC, filters by Caps containing "eth/71", picks one uniformly at random. Wire-up in node/eth/backend.go: - After sentriesClient is constructed, kick off `go NewBALDownloader(sentriesClient, chainDB, logger).Run(sentryCtx)`. Lifetime tied to sentryCtx so shutdown cancels the loop cleanly. Tested: go build ./p2p/sentry/... ./node/eth/... clean; make lint 0 issues; make erigon builds. Existing sentry multi-client short tests still pass. This completes the Phase 5 stage-integration path. With the full stack landed (Phases 1+2 wire protocol, 3 answer handler, 4 sentry dispatch, 5a server subscription, 5b.1 fetcher primitive, 5c downloader) two erigon nodes running this branch on a BAL-enabled devnet will negotiate eth/71 and exchange BALs end-to-end without any operator intervention. Next: Phase 6 — hive / integration tests and devnet verification (bal-devnet-3 for immediate testing; bal-devnet-4 early next week). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without this, eth/71 is registered in the message tables (Phase 1) and the sentry dispatch (Phase 4) but the devp2p server never constructs a listener for it — the default ProtocolVersion slice only includes ETH69 and ETH70, so peers see capabilities "eth/69, eth/70" and never negotiate eth/71 even when both sides support it. Tested locally with two erigon instances (bal-devnet-3 chain, static peers): before this patch the "Started P2P networking" log lines only showed version=69 and version=70; with the patch eth/71 appears and peers can exchange GetBlockAccessLists / BlockAccessLists. Runtime override via --p2p.protocol-version still works for users who want a narrower set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a launch skill for the bal-devnet-3 ethpandaops devnet alongside the existing launch-bal-devnet-2 skill. Structure is identical; diffs are the devnet-specific constants (chain ID 7098917910, genesis/checkpoint-sync URLs, Lighthouse image tag bal-devnet-3-65bb283, 15 vs 16 bootnodes, Dora explorer URL). Ships with the eth/71 (EIP-8159) PR so devnet-3 reproducers are reusable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
) Companion to the handler-side change. ethereum/EIPs#11553 made the "not available" sentinel 0x80 (empty RLP string) — distinct from 0xc0 (empty RLP list = genuinely empty BAL). Old wire ambiguity is gone, so the fetcher now does a clean three-way decode: 0x80 → peer doesn't have it → out[i] = nil (caller retries elsewhere) 0xc0 → peer claims empty BAL → accepted iff expected hash equals empty.BlockAccessListHash; otherwise hash-mismatch → kick else → must hash to expected; otherwise kick Pre-EIP-11553, returning 0xc0 with an expected non-empty hash was silently treated as "unavailable" because we couldn't tell the peer apart from one being honest about not having it. With 0x80 as the explicit unavailable signal, that 0xc0-with-non-empty-hash case is now unambiguous lying behaviour, so the fetcher kicks. - bal_fetcher.go: three-way decode + comment refresh. - bal_fetcher_test.go: existing empty-BAL test split into two — one for legitimate empty (0xc0+matching hash) plus 0x80 not-available; new test asserts 0xc0+non-empty-hash kicks the peer. Wrong-peer test uses 0x80 for the benign target reply. - bal_downloader.go: comment cleanup; the 0xc0 special-case was a no-op (fetcher now hands the canonical empty-BAL bytes through, so the writer just persists whatever non-nil payload arrived). - docs/plans/eip-8159-eth71-bal-exchange.md: update the empty-RLP ambiguity section to reflect the three-way decode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3e7bb0e to
4bdf325
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Third of three stacked PRs implementing EIP-8159. Adds the consumer-side
BALFetcherprimitive, a backgroundBALDownloaderthat wires the fetcher into the node start path, and enables eth/71 in the default protocol-version list. Also ships a companion devnet-launch skill so bal-devnet-3 reproducers are reusable.Depends on #20794 — review/merge that one first.
What lands
BALFetcherinp2p/sentry/sentry_multi_client/bal_fetcher.go:Sentry.PenalizePeer(Kick). Garbage or wrong-hash payloads never cross the p2p boundary.0xc0): accepted only whenexpected_hash == empty.BlockAccessListHash, treated as "peer has nothing" otherwise.rawdb.WriteBlockAccessListBytes.BALDownloaderinp2p/sentry/sentry_multi_client/bal_downloader.go: periodic scan for blocks whose header has aBlockAccessListHashand no local BAL entry; dispatches batches via the fetcher.node/eth/backend.gospawns the downloader at startup. Always-on, negotiation-driven — silent no-op if no peer advertises eth/71.node/nodecfgaddsETH71to the default P2P protocol version list so erigon advertises eth/71 in its handshake..claude/skills/launch-bal-devnet-3/SKILL.md— companion skill for launching erigon + Lighthouse on the bal-devnet-3 ethpandaops devnet (parallels the existinglaunch-bal-devnet-2).Deferred follow-ups
0xc0replies to non-empty expected hashes will build on top of this primitive. Will land as a follow-up once the substrate is reviewed.Testing limitations
End-to-end verification on bal-devnet-3 is currently blocked by an unrelated execution bug at block 503 (EIP-8037 state-gas accounting, tracked in #20791). Reproduces deterministically on
mainwith a fresh datadir and is not caused by anything in these PRs. When bal-devnet-4 goes live (currently no genesis/bootnodes yet), we'll re-run the end-to-end check there.Local test coverage:
go test -short ./p2p/...(includesTestBlockAccessListsPacket66RoundTrip,TestAnswerGetBlockAccessListsQuery_OrderedResponseWithMissing,TestAnswerGetBlockAccessListsQuery_SoftSizeLimit, plusbal_fetcher_test.go)make lint(0 issues, 2× determinism pass)make erigon integrationStack