Skip to content

refactor(l1): improve error types with actionable context#6126

Open
pablodeymo wants to merge 8 commits into
mainfrom
refactor/improve-error-types-with-context
Open

refactor(l1): improve error types with actionable context#6126
pablodeymo wants to merge 8 commits into
mainfrom
refactor/improve-error-types-with-context

Conversation

@pablodeymo

Copy link
Copy Markdown
Contributor

Motivation

This PR implements Section 1.4 ("Improve Error Type Definitions") from the UX/DevEx improvement roadmap (PR #6107). Error enums currently have variants with no context, making debugging difficult for operators. When a decode error occurs, the message InvalidLength tells you nothing about what failed to decode or why.

Current pain point example:

ERROR ethrex::p2p: Failed to decode message: InvalidLength

Operator asks: "InvalidLength of what? The block? A transaction? The header?"

After this PR:

ERROR ethrex::p2p: Failed to decode message: Invalid RLP length decoding Transaction

Operator knows: "The Transaction decoder failed. Let me check if it's a known bad transaction format."


Description

Approach: Context at API Boundaries (Option B)

After analyzing the codebase, we chose an approach that balances debuggability with code simplicity:

  1. Keep internal errors simple - Generic decoders in decode.rs don't know what type they're decoding, so they use constructors like invalid_length() that create errors without context.

  2. Add context at type boundaries - Type-specific decoders (Transaction, Account, etc.) wrap errors with context using .with_context("TypeName").

  3. Zero memory overhead - Verified by measurement that adding Option<&'static str> fields doesn't increase enum size because existing String variants already dominate.

Why Not Other Approaches?

Approach Rejected Because
Full context everywhere 70+ sites in decode.rs alone; generic decoders don't know the type
#[track_caller] thiserror doesn't integrate well; location info helps devs, not operators
Error wrapper type Changes return types at API boundaries; more invasive

Changes

1. RLPDecodeError (crates/common/rlp/error.rs)

Before:

#[error("InvalidLength")]
InvalidLength,
#[error("MalformedBoolean")]
MalformedBoolean,

After:

#[error("Invalid RLP length{}", fmt_ctx(.0))]
InvalidLength(Option<&'static str>),
#[error("Malformed boolean: expected 0x80 or 0x01, got 0x{0:02x}")]
MalformedBoolean(u8),

// Helper constructors for internal use
impl RLPDecodeError {
    pub fn invalid_length() -> Self { Self::InvalidLength(None) }
    pub fn malformed_boolean(got: u8) -> Self { Self::MalformedBoolean(got) }
    
    pub fn with_context(self, ctx: &'static str) -> Self {
        match self {
            Self::InvalidLength(_) => Self::InvalidLength(Some(ctx)),
            // ...
        }
    }
}

2. PeerConnectionError (crates/networking/p2p/rlpx/error.rs)

Before:

#[error("No matching capabilities")]
NoMatchingCapabilities,
#[error("Invalid peer id")]
InvalidPeerId,
#[error("Invalid message length")]
InvalidMessageLength,

After:

#[error("No matching capabilities: {0}")]
NoMatchingCapabilities(String),
#[error("Invalid peer ID: {0}")]
InvalidPeerId(&'static str),
#[error("Invalid message length: {0}")]
InvalidMessageLength(&'static str),

3. Context at Type Boundaries

impl RLPDecode for Transaction {
    fn decode_unfinished(rlp: &[u8]) -> Result<(Self, &[u8]), RLPDecodeError> {
        decode_transaction(rlp).map_err(|e| e.with_context("Transaction"))
    }
}

4. Removed Dead Code

  • StoreError::DecodeError - defined but never used anywhere in the codebase
  • TODO comments from error files (the TODOs are now addressed)

Error Message Comparison

Scenario Before After
Transaction decode InvalidLength Invalid RLP length decoding Transaction
Boolean decode MalformedBoolean Malformed boolean: expected 0x80 or 0x01, got 0x42
Capability mismatch No matching capabilities No matching capabilities: no common eth version
Peer ID compression Invalid peer id Invalid peer ID: failed to compress public key
Handshake parse Invalid message length Invalid message length: handshake message too short
Frame size Invalid message length Invalid message length: frame exceeds max size

Memory Analysis

Verified that adding context fields has zero memory overhead:

Component sizes:
  String:               24 bytes
  Option<&'static str>: 16 bytes

Enum sizes:
  CurrentError:  40 bytes
  ProposedError: 40 bytes  ← No change!

Result sizes:
  Result<u8, Error>:     40 bytes  ← No change!
  Result<u64, Error>:    40 bytes  ← No change!

The enum size is determined by its largest variant (String at 24 bytes). Adding 16-byte Option<&'static str> fields to smaller variants doesn't increase the overall size.


Files Changed

Category Files Purpose
Error definitions rlp/error.rs, rlpx/error.rs, storage/error.rs New error variants with context
Generic decoders rlp/decode.rs, rlp/structs.rs Use new constructors
Type decoders types/transaction.rs, types/account.rs, types/receipt.rs, trie/node_hash.rs Add context at boundaries
P2P code handshake.rs, server.rs, codec.rs, metrics.rs Update to new error variants
Network messages discv4/messages.rs, discv5/messages.rs, rlpx/message.rs, etc. Use new constructors

Breaking Changes

This is a breaking change for anyone matching on these error variants:

// Before
match err {
    RLPDecodeError::InvalidLength => { ... }
}

// After
match err {
    RLPDecodeError::InvalidLength(ctx) => {
        if let Some(c) = ctx {
            log::error!("Invalid length decoding {}", c);
        }
    }
}

All breakage is compile-time only - the compiler tells you exactly what to fix.


Testing

  • cargo check passes
  • cargo fmt passes
  • cargo clippy passes
  • cargo test -p ethrex-rlp - 2 tests pass
  • cargo test -p ethrex-common - 62 tests pass
  • cargo test -p ethrex-p2p - 50 tests pass

Related

  • Parent roadmap: PR docs(l1): add UX/DevEx roadmap #6107 (UX/DevEx Improvement Plan)
  • Addresses: Section 1.4 "Improve Error Type Definitions"
  • Related sections: 1.3 (context-discarding map_err) can be tackled separately

Add context fields to RLPDecodeError and PeerConnectionError variants
to make error messages self-describing and actionable for operators.

RLPDecodeError changes:
- InvalidLength, MalformedData, UnexpectedList, UnexpectedString now
  carry optional context via Option<&'static str>
- MalformedBoolean now includes the actual byte value received
- Added helper constructors (invalid_length(), malformed_data(), etc.)
- Added with_context() method for adding context at type boundaries

PeerConnectionError changes:
- NoMatchingCapabilities now includes what capabilities were available
- InvalidPeerId now includes reason string
- InvalidMessageLength now includes context about what was being parsed

Additional cleanup:
- Removed unused StoreError::DecodeError variant
- Removed TODO comments from error files
- Added context at Transaction and P2PTransaction decode boundaries

Error messages before vs after:
- "InvalidLength" → "Invalid RLP length decoding Transaction"
- "MalformedBoolean" → "Malformed boolean: expected 0x80 or 0x01, got 0x42"
- "No matching capabilities" → "No matching capabilities: no common eth version"
- "Invalid peer id" → "Invalid peer ID: failed to compress public key"

This addresses Section 1.4 of the UX/DevEx improvement roadmap (PR #6107).
@pablodeymo pablodeymo requested a review from ilitteri as a code owner February 4, 2026 21:59
Copilot AI review requested due to automatic review settings February 4, 2026 21:59
@github-actions github-actions Bot added the L1 Ethereum client label Feb 4, 2026
@github-actions

github-actions Bot commented Feb 4, 2026

Copy link
Copy Markdown

Lines of code report

Total lines added: 60
Total lines removed: 2
Total lines changed: 62

Detailed view
+-----------------------------------------------------------+-------+------+
| File                                                      | Lines | Diff |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/common/rlp/decode.rs                        | 437   | +1   |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/common/rlp/error.rs                         | 56    | +29  |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/common/types/account.rs                     | 330   | +3   |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/common/types/block.rs                       | 956   | +9   |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/common/types/receipt.rs                     | 469   | +6   |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/common/types/transaction.rs                 | 3289  | +6   |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/rlpx/connection/codec.rs     | 244   | +2   |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/rlpx/connection/handshake.rs | 530   | +2   |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/rlpx/connection/server.rs    | 1053  | +2   |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/storage/error.rs                            | 47    | -2   |
+-----------------------------------------------------------+-------+------+

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves error types by adding contextual information to make debugging easier for operators. The changes focus on RLP decoding errors and P2P connection errors, implementing a "context at API boundaries" approach where generic decoders use simple constructors and type-specific decoders add context using .with_context().

Changes:

  • Enhanced RLPDecodeError enum variants to include optional context fields and provide helpful constructor methods
  • Improved PeerConnectionError variants to include descriptive context strings
  • Added context at type decoding boundaries (Transaction, Account, Receipt, NodeHash, etc.)
  • Updated all pattern matches and error usages across the codebase
  • Removed unused StoreError::DecodeError variant

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
crates/common/rlp/error.rs Core error enum refactoring with context fields and helper methods
crates/common/rlp/decode.rs Updated to use new error constructors throughout
crates/common/rlp/structs.rs Updated Decoder to use new error constructors
crates/common/types/transaction.rs Added context wrapping in RLPDecode implementations
crates/common/types/account.rs Updated to use new error constructors
crates/common/types/receipt.rs Updated to use new error constructors
crates/common/trie/node_hash.rs Updated to use new error constructors
crates/networking/p2p/rlpx/error.rs Enhanced error variants with context strings
crates/networking/p2p/rlpx/connection/handshake.rs Added descriptive context to error messages
crates/networking/p2p/rlpx/connection/server.rs Updated pattern matches for new error variants
crates/networking/p2p/rlpx/connection/codec.rs Added context to InvalidMessageLength errors
crates/networking/p2p/metrics.rs Updated pattern matches for error tracking
crates/networking/p2p/types.rs Updated to use new error constructors
crates/networking/p2p/rlpx/message.rs Updated to use malformed_data() constructor
crates/networking/p2p/rlpx/eth/blocks.rs Updated to use new error constructors
crates/networking/p2p/rlpx/p2p.rs Updated to use new error constructors
crates/networking/p2p/discv4/messages.rs Updated to use malformed_data() constructor
crates/networking/p2p/discv5/messages.rs Updated to use new error constructors
crates/networking/p2p/sync/storage_healing.rs Updated to use malformed_data() constructor
crates/storage/error.rs Removed unused DecodeError variant

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if msg_size > P2P_MAX_MESSAGE_SIZE {
return Err(PeerConnectionError::InvalidMessageLength);
return Err(PeerConnectionError::InvalidMessageLength(
"handshake message too short",

Copilot AI Feb 4, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message says "handshake message too short" but the check is for a message that exceeds the maximum size. This message should be "handshake message exceeds maximum size" or similar to accurately describe the failure condition.

Suggested change
"handshake message too short",
"handshake message exceeds maximum size",

Copilot uses AI. Check for mistakes.
@greptile-apps

greptile-apps Bot commented Feb 4, 2026

Copy link
Copy Markdown

Greptile Overview

Greptile Summary

This PR refactors error handling across the codebase, expanding several error enums (notably RLPDecodeError and PeerConnectionError) to carry more actionable context and updating decode/encode boundaries to attach that context where errors originate.

Within the networking stack, this touches devp2p/RLPx message encoding/decoding, handshake validation, and various message types (eth/snap/based), aiming to improve operator visibility when malformed or incompatible data is received from peers.

Confidence Score: 3/5

  • This PR needs a couple of correctness fixes in RLPx message coding before it’s safe to merge.
  • The error-context refactor looks mechanically consistent, but there are at least two concrete protocol/diagnostic issues introduced: Receipts69 encodes with the wrong message code, and oversized handshake messages are mislabeled as "too short". Fixing these should restore expected network behavior and observability.
  • crates/networking/p2p/rlpx/message.rs; crates/networking/p2p/rlpx/connection/handshake.rs

Important Files Changed

Filename Overview
crates/common/rlp/decode.rs Updates RLP primitive/container decoding to use new contextual RLPDecodeError helpers (e.g., invalid length / malformed boolean) and propagate context at type boundaries.
crates/common/rlp/error.rs Refactors RLPDecodeError/RLPEncodeError variants and constructors to carry optional context strings for more actionable diagnostics.
crates/networking/p2p/rlpx/error.rs Adjusts PeerConnectionError variants/messages to include more actionable context; no behavioral change noted in reviewed paths.
crates/networking/p2p/rlpx/message.rs Refactors message code/dispatch; contains a must-fix bug where Receipts69 encodes with the Receipts68 message code.
crates/networking/p2p/rlpx/connection/handshake.rs Improves handshake errors/context; contains a must-fix bug where oversized handshake messages are reported as "too short".

Sequence Diagram

sequenceDiagram
    participant Peer as Remote peer
    participant Conn as RLPx connection
    participant HS as Handshake
    participant Codec as RLPxCodec
    participant Msg as Message codec
    participant RLP as RLP decoder

    Peer->>Conn: TCP connect
    Conn->>HS: send_auth / receive_auth
    HS->>HS: receive_handshake_msg (size + payload)
    HS->>RLP: Auth/Ack RLPDecode
    RLP-->>HS: RLPDecodeError (+context)
    HS-->>Conn: PeerConnectionError
    Conn->>Codec: init codec with nonces
    Peer->>Codec: framed RLPx messages
    Codec->>Msg: Message::decode(msg_id, data)
    Msg->>RLP: Per-message RLPDecode
    RLP-->>Msg: RLPDecodeError (+context)
    Msg-->>Codec: decoded Message / error
    Codec-->>Conn: dispatch / disconnect
Loading

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment thread crates/networking/p2p/rlpx/connection/handshake.rs
Comment thread crates/networking/p2p/rlpx/connection/handshake.rs
@greptile-apps

greptile-apps Bot commented Feb 4, 2026

Copy link
Copy Markdown
Additional Comments (2)

crates/networking/p2p/rlpx/message.rs
Receipts69 code uses 68

Message::code() maps Message::Receipts69 to Receipts68::CODE (crates/networking/p2p/rlpx/message.rs:133). This makes eth/69 receipt responses encode with the wrong message ID, so peers expecting the eth/69 receipts code will misinterpret or reject the message.

Also appears in the same function’s eth-code mapping block; only Receipts69 is wrong here.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/rlpx/message.rs
Line: 131:136

Comment:
**Receipts69 code uses 68**

`Message::code()` maps `Message::Receipts69` to `Receipts68::CODE` (`crates/networking/p2p/rlpx/message.rs:133`). This makes eth/69 receipt responses encode with the wrong message ID, so peers expecting the eth/69 receipts code will misinterpret or reject the message.

Also appears in the same function’s eth-code mapping block; only `Receipts69` is wrong here.

How can I resolve this? If you propose a fix, please make it concise.

crates/networking/p2p/rlpx/message.rs
Wrong eth/69 message ID

Message::Receipts69 is currently using Receipts68::CODE in code() (crates/networking/p2p/rlpx/message.rs:131-134). When negotiated EthCapVersion::V69, this will emit the wrong message ID on the wire, causing peers to misinterpret/ignore the message. This should use Receipts69::CODE (and it’s already decoded as such in decode()).

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/rlpx/message.rs
Line: 131:134

Comment:
**Wrong eth/69 message ID**

`Message::Receipts69` is currently using `Receipts68::CODE` in `code()` (`crates/networking/p2p/rlpx/message.rs:131-134`). When negotiated `EthCapVersion::V69`, this will emit the wrong message ID on the wire, causing peers to misinterpret/ignore the message. This should use `Receipts69::CODE` (and it’s already decoded as such in `decode()`).

How can I resolve this? If you propose a fix, please make it concise.

…P_MAX_MESSAGE_SIZE

was returning "handshake message too short" when it should say "exceeds maximum size"
@pablodeymo

Copy link
Copy Markdown
Contributor Author

@greptile-apps ?

@greptile-apps

greptile-apps Bot commented Feb 4, 2026

Copy link
Copy Markdown

Both issues are resolved - the error messages now correctly describe "exceeds maximum size" instead of "too short". Thanks for the quick fix!

@iovoid iovoid left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several decoders aren't using with_context, such as: Block, BlockHeader, Receipt, ReceiptWithBloom, AccountState, etc

Comment thread crates/common/rlp/structs.rs Outdated
@github-project-automation github-project-automation Bot moved this to In Progress in ethrex_l1 Feb 5, 2026
Receipt, ReceiptWithBloom, and AccountState now propagate context at
type boundaries for better error messages when RLP decoding fails
@pablodeymo pablodeymo requested a review from iovoid February 5, 2026 14:31
@pablodeymo

Copy link
Copy Markdown
Contributor Author

Fixed! Made the helper constructors const fn so they can be used in Decoder::finish() which is also const. Thanks for catching that.

@ElFantasma ElFantasma left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nits


// TODO: improve errors
fn fmt_ctx(ctx: &Option<&'static str>) -> String {
ctx.map(|c| format!(" decoding {c}")).unwrap_or_default()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: fmt_ctx allocates a String on every Display call via format!(). In the None case, unwrap_or_default() returns an empty String (heap-allocated). Since RLP decode errors can appear in P2P logging hot paths, you could avoid the allocation by inlining the formatting:

#[error("Invalid RLP length{}", .0.map(|c| format!(" decoding {c}")).unwrap_or_default())]

Or change the helper to return &str:

fn fmt_ctx(ctx: &Option<&'static str>) -> &'static str {
    // Unfortunately doesn't work with format — 
    // you'd need a different approach
}

Actually the simplest fix: implement Display manually for the variants that need context, writing directly to the formatter without intermediate String. But this is minor — the current approach works, it's just doing a small heap allocation per error display.


pub fn with_context(self, ctx: &'static str) -> Self {
match self {
Self::InvalidLength(_) => Self::InvalidLength(Some(ctx)),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_context unconditionally overwrites any existing context. Currently safe because nested type errors go through field_decode_errorCustom(...) which isn't touched here. But if someone later uses with_context on a type that's decoded inside another with_context-wrapped type without going through Decoder::decode_field, the inner context is silently lost.

Consider either:

  1. A comment documenting "outermost caller wins" behavior
  2. Or: Self::InvalidLength(existing) => Self::InvalidLength(Some(existing.unwrap_or(ctx))) to preserve inner context ("innermost wins")

StateError(String),
#[error("No matching capabilities")]
NoMatchingCapabilities,
#[error("No matching capabilities: {0}")]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Inconsistent context types — NoMatchingCapabilities(String) uses owned String while InvalidPeerId(&'static str) and InvalidMessageLength(&'static str) use borrowed &'static str. The String here is needed because the call site in server.rs builds the message dynamically ("no common eth version".to_string()), but that string is actually a literal — it could be &'static str too. Consider making this &'static str for consistency (unless you anticipate dynamic messages in the future).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

6 participants