refactor(l1): improve error types with actionable context#6126
refactor(l1): improve error types with actionable context#6126pablodeymo wants to merge 8 commits into
Conversation
Add context fields to RLPDecodeError and PeerConnectionError variants to make error messages self-describing and actionable for operators. RLPDecodeError changes: - InvalidLength, MalformedData, UnexpectedList, UnexpectedString now carry optional context via Option<&'static str> - MalformedBoolean now includes the actual byte value received - Added helper constructors (invalid_length(), malformed_data(), etc.) - Added with_context() method for adding context at type boundaries PeerConnectionError changes: - NoMatchingCapabilities now includes what capabilities were available - InvalidPeerId now includes reason string - InvalidMessageLength now includes context about what was being parsed Additional cleanup: - Removed unused StoreError::DecodeError variant - Removed TODO comments from error files - Added context at Transaction and P2PTransaction decode boundaries Error messages before vs after: - "InvalidLength" → "Invalid RLP length decoding Transaction" - "MalformedBoolean" → "Malformed boolean: expected 0x80 or 0x01, got 0x42" - "No matching capabilities" → "No matching capabilities: no common eth version" - "Invalid peer id" → "Invalid peer ID: failed to compress public key" This addresses Section 1.4 of the UX/DevEx improvement roadmap (PR #6107).
Lines of code reportTotal lines added: Detailed view |
There was a problem hiding this comment.
Pull request overview
This PR improves error types by adding contextual information to make debugging easier for operators. The changes focus on RLP decoding errors and P2P connection errors, implementing a "context at API boundaries" approach where generic decoders use simple constructors and type-specific decoders add context using .with_context().
Changes:
- Enhanced
RLPDecodeErrorenum variants to include optional context fields and provide helpful constructor methods - Improved
PeerConnectionErrorvariants to include descriptive context strings - Added context at type decoding boundaries (Transaction, Account, Receipt, NodeHash, etc.)
- Updated all pattern matches and error usages across the codebase
- Removed unused
StoreError::DecodeErrorvariant
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
crates/common/rlp/error.rs |
Core error enum refactoring with context fields and helper methods |
crates/common/rlp/decode.rs |
Updated to use new error constructors throughout |
crates/common/rlp/structs.rs |
Updated Decoder to use new error constructors |
crates/common/types/transaction.rs |
Added context wrapping in RLPDecode implementations |
crates/common/types/account.rs |
Updated to use new error constructors |
crates/common/types/receipt.rs |
Updated to use new error constructors |
crates/common/trie/node_hash.rs |
Updated to use new error constructors |
crates/networking/p2p/rlpx/error.rs |
Enhanced error variants with context strings |
crates/networking/p2p/rlpx/connection/handshake.rs |
Added descriptive context to error messages |
crates/networking/p2p/rlpx/connection/server.rs |
Updated pattern matches for new error variants |
crates/networking/p2p/rlpx/connection/codec.rs |
Added context to InvalidMessageLength errors |
crates/networking/p2p/metrics.rs |
Updated pattern matches for error tracking |
crates/networking/p2p/types.rs |
Updated to use new error constructors |
crates/networking/p2p/rlpx/message.rs |
Updated to use malformed_data() constructor |
crates/networking/p2p/rlpx/eth/blocks.rs |
Updated to use new error constructors |
crates/networking/p2p/rlpx/p2p.rs |
Updated to use new error constructors |
crates/networking/p2p/discv4/messages.rs |
Updated to use malformed_data() constructor |
crates/networking/p2p/discv5/messages.rs |
Updated to use new error constructors |
crates/networking/p2p/sync/storage_healing.rs |
Updated to use malformed_data() constructor |
crates/storage/error.rs |
Removed unused DecodeError variant |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if msg_size > P2P_MAX_MESSAGE_SIZE { | ||
| return Err(PeerConnectionError::InvalidMessageLength); | ||
| return Err(PeerConnectionError::InvalidMessageLength( | ||
| "handshake message too short", |
There was a problem hiding this comment.
The error message says "handshake message too short" but the check is for a message that exceeds the maximum size. This message should be "handshake message exceeds maximum size" or similar to accurately describe the failure condition.
| "handshake message too short", | |
| "handshake message exceeds maximum size", |
Greptile OverviewGreptile SummaryThis PR refactors error handling across the codebase, expanding several error enums (notably Within the networking stack, this touches devp2p/RLPx message encoding/decoding, handshake validation, and various message types (eth/snap/based), aiming to improve operator visibility when malformed or incompatible data is received from peers. Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| crates/common/rlp/decode.rs | Updates RLP primitive/container decoding to use new contextual RLPDecodeError helpers (e.g., invalid length / malformed boolean) and propagate context at type boundaries. |
| crates/common/rlp/error.rs | Refactors RLPDecodeError/RLPEncodeError variants and constructors to carry optional context strings for more actionable diagnostics. |
| crates/networking/p2p/rlpx/error.rs | Adjusts PeerConnectionError variants/messages to include more actionable context; no behavioral change noted in reviewed paths. |
| crates/networking/p2p/rlpx/message.rs | Refactors message code/dispatch; contains a must-fix bug where Receipts69 encodes with the Receipts68 message code. |
| crates/networking/p2p/rlpx/connection/handshake.rs | Improves handshake errors/context; contains a must-fix bug where oversized handshake messages are reported as "too short". |
Sequence Diagram
sequenceDiagram
participant Peer as Remote peer
participant Conn as RLPx connection
participant HS as Handshake
participant Codec as RLPxCodec
participant Msg as Message codec
participant RLP as RLP decoder
Peer->>Conn: TCP connect
Conn->>HS: send_auth / receive_auth
HS->>HS: receive_handshake_msg (size + payload)
HS->>RLP: Auth/Ack RLPDecode
RLP-->>HS: RLPDecodeError (+context)
HS-->>Conn: PeerConnectionError
Conn->>Codec: init codec with nonces
Peer->>Codec: framed RLPx messages
Codec->>Msg: Message::decode(msg_id, data)
Msg->>RLP: Per-message RLPDecode
RLP-->>Msg: RLPDecodeError (+context)
Msg-->>Codec: decoded Message / error
Codec-->>Conn: dispatch / disconnect
Additional Comments (2)
Also appears in the same function’s eth-code mapping block; only Prompt To Fix With AIThis is a comment left during a code review.
Path: crates/networking/p2p/rlpx/message.rs
Line: 131:136
Comment:
**Receipts69 code uses 68**
`Message::code()` maps `Message::Receipts69` to `Receipts68::CODE` (`crates/networking/p2p/rlpx/message.rs:133`). This makes eth/69 receipt responses encode with the wrong message ID, so peers expecting the eth/69 receipts code will misinterpret or reject the message.
Also appears in the same function’s eth-code mapping block; only `Receipts69` is wrong here.
How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix With AIThis is a comment left during a code review.
Path: crates/networking/p2p/rlpx/message.rs
Line: 131:134
Comment:
**Wrong eth/69 message ID**
`Message::Receipts69` is currently using `Receipts68::CODE` in `code()` (`crates/networking/p2p/rlpx/message.rs:131-134`). When negotiated `EthCapVersion::V69`, this will emit the wrong message ID on the wire, causing peers to misinterpret/ignore the message. This should use `Receipts69::CODE` (and it’s already decoded as such in `decode()`).
How can I resolve this? If you propose a fix, please make it concise. |
…P_MAX_MESSAGE_SIZE was returning "handshake message too short" when it should say "exceeds maximum size"
|
Both issues are resolved - the error messages now correctly describe "exceeds maximum size" instead of "too short". Thanks for the quick fix! |
iovoid
left a comment
There was a problem hiding this comment.
Several decoders aren't using with_context, such as: Block, BlockHeader, Receipt, ReceiptWithBloom, AccountState, etc
Receipt, ReceiptWithBloom, and AccountState now propagate context at type boundaries for better error messages when RLP decoding fails
…ta() in Decoder::finish() as requested in review
|
Fixed! Made the helper constructors |
|
|
||
| // TODO: improve errors | ||
| fn fmt_ctx(ctx: &Option<&'static str>) -> String { | ||
| ctx.map(|c| format!(" decoding {c}")).unwrap_or_default() |
There was a problem hiding this comment.
nit: fmt_ctx allocates a String on every Display call via format!(). In the None case, unwrap_or_default() returns an empty String (heap-allocated). Since RLP decode errors can appear in P2P logging hot paths, you could avoid the allocation by inlining the formatting:
#[error("Invalid RLP length{}", .0.map(|c| format!(" decoding {c}")).unwrap_or_default())]Or change the helper to return &str:
fn fmt_ctx(ctx: &Option<&'static str>) -> &'static str {
// Unfortunately doesn't work with format —
// you'd need a different approach
}Actually the simplest fix: implement Display manually for the variants that need context, writing directly to the formatter without intermediate String. But this is minor — the current approach works, it's just doing a small heap allocation per error display.
|
|
||
| pub fn with_context(self, ctx: &'static str) -> Self { | ||
| match self { | ||
| Self::InvalidLength(_) => Self::InvalidLength(Some(ctx)), |
There was a problem hiding this comment.
with_context unconditionally overwrites any existing context. Currently safe because nested type errors go through field_decode_error → Custom(...) which isn't touched here. But if someone later uses with_context on a type that's decoded inside another with_context-wrapped type without going through Decoder::decode_field, the inner context is silently lost.
Consider either:
- A comment documenting "outermost caller wins" behavior
- Or:
Self::InvalidLength(existing) => Self::InvalidLength(Some(existing.unwrap_or(ctx)))to preserve inner context ("innermost wins")
| StateError(String), | ||
| #[error("No matching capabilities")] | ||
| NoMatchingCapabilities, | ||
| #[error("No matching capabilities: {0}")] |
There was a problem hiding this comment.
nit: Inconsistent context types — NoMatchingCapabilities(String) uses owned String while InvalidPeerId(&'static str) and InvalidMessageLength(&'static str) use borrowed &'static str. The String here is needed because the call site in server.rs builds the message dynamically ("no common eth version".to_string()), but that string is actually a literal — it could be &'static str too. Consider making this &'static str for consistency (unless you anticipate dynamic messages in the future).
Motivation
This PR implements Section 1.4 ("Improve Error Type Definitions") from the UX/DevEx improvement roadmap (PR #6107). Error enums currently have variants with no context, making debugging difficult for operators. When a decode error occurs, the message
InvalidLengthtells you nothing about what failed to decode or why.Current pain point example:
Operator asks: "InvalidLength of what? The block? A transaction? The header?"
After this PR:
Operator knows: "The Transaction decoder failed. Let me check if it's a known bad transaction format."
Description
Approach: Context at API Boundaries (Option B)
After analyzing the codebase, we chose an approach that balances debuggability with code simplicity:
Keep internal errors simple - Generic decoders in
decode.rsdon't know what type they're decoding, so they use constructors likeinvalid_length()that create errors without context.Add context at type boundaries - Type-specific decoders (Transaction, Account, etc.) wrap errors with context using
.with_context("TypeName").Zero memory overhead - Verified by measurement that adding
Option<&'static str>fields doesn't increase enum size because existingStringvariants already dominate.Why Not Other Approaches?
decode.rsalone; generic decoders don't know the type#[track_caller]thiserrordoesn't integrate well; location info helps devs, not operatorsChanges
1. RLPDecodeError (
crates/common/rlp/error.rs)Before:
After:
2. PeerConnectionError (
crates/networking/p2p/rlpx/error.rs)Before:
After:
3. Context at Type Boundaries
4. Removed Dead Code
StoreError::DecodeError- defined but never used anywhere in the codebaseError Message Comparison
InvalidLengthInvalid RLP length decoding TransactionMalformedBooleanMalformed boolean: expected 0x80 or 0x01, got 0x42No matching capabilitiesNo matching capabilities: no common eth versionInvalid peer idInvalid peer ID: failed to compress public keyInvalid message lengthInvalid message length: handshake message too shortInvalid message lengthInvalid message length: frame exceeds max sizeMemory Analysis
Verified that adding context fields has zero memory overhead:
The enum size is determined by its largest variant (
Stringat 24 bytes). Adding 16-byteOption<&'static str>fields to smaller variants doesn't increase the overall size.Files Changed
rlp/error.rs,rlpx/error.rs,storage/error.rsrlp/decode.rs,rlp/structs.rstypes/transaction.rs,types/account.rs,types/receipt.rs,trie/node_hash.rshandshake.rs,server.rs,codec.rs,metrics.rsdiscv4/messages.rs,discv5/messages.rs,rlpx/message.rs, etc.Breaking Changes
This is a breaking change for anyone matching on these error variants:
All breakage is compile-time only - the compiler tells you exactly what to fix.
Testing
cargo checkpassescargo fmtpassescargo clippypassescargo test -p ethrex-rlp- 2 tests passcargo test -p ethrex-common- 62 tests passcargo test -p ethrex-p2p- 50 tests passRelated
map_err) can be tackled separately