Delayed RPC Send Using Tokens #5923

ackintosh · 2024-06-13T13:50:14Z

Issue Addressed

Proposed Changes

The diagram below shows the differences in how the receiver (responder) behaves before and after this PR. The following sentences will detail the changes.

flowchart TD

subgraph "*** After ***"
    Start2([START]) --> AA[Receive request]
    AA --> COND1{Is there already an active request <br> with the same protocol?}
    COND1 --> |Yes| CC[Send error response]
    CC --> End2([END])
    %% COND1 --> |No| COND2{Request is too large?}
    %% COND2 --> |Yes| CC
    COND1 --> |No| DD[Process request]
    DD --> EE{Rate limit reached?}
    EE --> |Yes| FF[Wait until tokens are regenerated]
    FF --> EE
    EE --> |No| GG[Send response]
    GG --> End2
end

subgraph "*** Before ***"
    Start([START]) --> A[Receive request]
    A --> B{Rate limit reached <br> or <br> request is too large?}
    B -->|Yes| C[Send error response]
    C --> End([END])
    B -->|No| E[Process request]
    E --> F[Send response]
    F --> End
end

`Is there already an active request with the same protocol?`

This check is not performed in Before. This is taken from the PR in the consensus-spec, which proposes updates regarding rate limiting and response timeout.
https://github.com/ethereum/consensus-specs/pull/3767/files

The requester MUST NOT make more than two concurrent requests with the same ID.

The PR mentions the requester side. In this PR, I introduced the ActiveRequestsLimiter for the responder side to restrict more than two requests from running simultaneously on the same protocol per peer. If the limiter disallows a request, the responder sends a rate-limited error and penalizes the requester.

`Rate limit reached?` and `Wait until tokens are regenerated`

UPDATE: I moved the limiter logic to the behaviour side. #5923 (comment)

The rate limiter is shared between the behaviour and the handler. (Arc<Mutex<RateLimiter>>>) The handler checks the rate limit and queues the response if the limit is reached. The behaviour handles pruning.

I considered not sharing the rate limiter between the behaviour and the handler, and performing all of these either within the behaviour or handler. However, I decided against this for the following reasons:

Regarding performing everything within the behaviour: The behaviour is unable to recognize the response protocol when RPC::send_response() is called, especially when the response is RPCCodedResponse::Error. Therefore, the behaviour can't rate limit responses based on the response protocol.
Regarding performing everything within the handler: When multiple connections are established with a peer, there could be multiple handlers interacting with that peer. Thus, we cannot enforce rate limiting per peer solely within the handler. (Any ideas? 🤔 )

Additional Info

Naming

I have renamed the fields of the behaviour to make them more intuitive: #5923 (comment)

Testing

I have run beacon node with this changes for 24hours, it looks work fine.

The rate-limited error has not occurred anymore while running this branch.

Metrics

These metrics have been added in this PR.

self_limiter_request_idling_seconds: The time our own request remained idle in the self-limiter.
response_limiter_idling_seconds: The time our response remained idle in the response limiter.

# Conflicts: # beacon_node/lighthouse_network/src/rpc/handler.rs

…me protocol per peer

# Conflicts: # beacon_node/lighthouse_network/tests/rpc_tests.rs

ackintosh · 2025-03-12T22:06:27Z

@jxs @dapplion @pawanjay176 @jimmygchen @AgeManning

In this PR, the rate limiters have been updated in terms of naming and functionality according to the spec update. Here is a table summarizing the changes to the limiters. The current naming might not be ideal, so I hope this will be helpful for your feedback. 🙏

Previous Naming	Current Naming	What This Does
`limiter: Option<RateLimiter>`	`response_limiter: Option<ResponseLimiter<E>>`	Previously, this rate-limited inbound requests. Currently, it rate-limits our responses to inbound requests.
`self_limiter: Option<SelfRateLimiter<Id, E>>`	`outbound_request_limiter: SelfRateLimiter<Id, E>`	Rate-limits our outbound requests. Previously, it was optional.

Additionally, a global rate limiter for inbound requests has been proposed to reduce memory cost in worst-case senarios.
( #5923 (comment) )

…ponse # Conflicts: # beacon_node/lighthouse_network/src/rpc/handler.rs # beacon_node/lighthouse_network/src/rpc/mod.rs # beacon_node/lighthouse_network/src/rpc/self_limiter.rs # beacon_node/lighthouse_network/tests/common.rs # beacon_node/lighthouse_network/tests/rpc_tests.rs

sigp#5923 (comment)

…and requires stream termination.

# Conflicts: # beacon_node/lighthouse_network/src/rpc/handler.rs # beacon_node/lighthouse_network/src/rpc/mod.rs # beacon_node/lighthouse_network/src/rpc/self_limiter.rs

AgeManning · 2025-04-07T02:56:51Z

Ok, in the interest of not getting this thing stale and requiring us to maintain it, how do we all feel about a merge?
@jxs @pawanjay176 @dapplion ?

dapplion · 2025-04-07T17:54:19Z

I would like to have this in unstable soon, and work PeerDAS around this rate limit paradigm

pawanjay176 · 2025-04-07T23:38:30Z

I'm in favour of getting this in as well. Will test this for a bit and report back

dapplion

Approving minus testing by @pawanjay176

jxs

Thanks for the patience Ahikito, this looks reasonably good to me to be landed and tested!

beacon_node/lighthouse_network/src/rpc/mod.rs

pawanjay176

This looks good to me as well. Just minor nits.
Been running this for sometime on mainnet with hammering a synced node for requests and it holds up as expected.
Great work on this one, sorry it took so long for review and testing.

pawanjay176 · 2025-04-07T23:50:00Z

beacon_node/lighthouse_network/src/metrics.rs

@@ -206,6 +206,20 @@ pub static REPORT_PEER_MSGS: LazyLock<Result<IntCounterVec>> = LazyLock::new(||
    )
 });

+pub static SELF_LIMITER_REQUEST_IDLING: LazyLock<Result<Histogram>> = LazyLock::new(|| {


Just noting that if we are changing the naming as mentioned in #5923 (comment) , then we should probably change the metric names here too for consistency

Thanks for the feedback! I have pushed a commit to update the metric names, but the commit is not showing up in this PR. I'm not sure why; maybe waiting a bit will resolve this.

pawanjay176 · 2025-04-07T23:50:15Z

beacon_node/lighthouse_network/src/metrics.rs

+});
+
+pub static RESPONSE_LIMITER_RESPONSE_IDLING: LazyLock<Result<Histogram>> = LazyLock::new(|| {
+    try_create_histogram(


same as above

ackintosh · 2025-04-23T23:02:45Z

@pawanjay176 Sorry for the delay.I've addressed your feedback.

pawanjay176

Awesome, let's get this in!

dapplion · 2025-05-15T17:32:35Z

beacon_node/lighthouse_network/src/rpc/self_limiter.rs

-    /// Requests queued for sending per peer. This requests are stored when the self rate
+    /// Active requests that are awaiting a response.
+    active_requests: HashMap<PeerId, HashMap<Protocol, usize>>,
+    /// Requests queued for sending per peer. These requests are stored when the self rate
    /// limiter rejects them. Rate limiting is based on a Peer and Protocol basis, therefore
    /// are stored in the same way.
    delayed_requests: HashMap<(PeerId, Protocol), VecDeque<QueuedRequest<Id, E>>>,


Would be great to have these additional metrics to debug PeerDAS

Counter: Total count of queue requests labeled by Protocol string

Same for rate_limiter delayed_responses

Histogram: VecDeque length at the moment of queuing a request, labeled by Protocol string = observe length before pushing

Same for rate_limiter delayed_responses

Gauge: Current sum of VecDeque lengths for all items in delayed_requests

ackintosh added 6 commits June 10, 2024 07:07

trickle responses

0154359

pruning

d5fe64e

cargo fmt

aab59f5

Test that the receiver delays the responses

e00e679

Add doc comments

670ec96

Fix typo

c0ae632

ackintosh added work-in-progress PR is a work-in-progress Networking skip-ci Don't run the `test-suite` labels Jun 13, 2024

ackintosh force-pushed the delayed-rpc-response branch from 29e3f00 to 90361d6 Compare June 19, 2024 22:04

Add inbound request size limiter

7e0c630

ackintosh force-pushed the delayed-rpc-response branch from 90361d6 to 7e0c630 Compare June 19, 2024 22:44

ackintosh added 10 commits June 21, 2024 07:20

Merge branch 'refs/heads/unstable' into delayed-rpc-response

6322210

# Conflicts: # beacon_node/lighthouse_network/src/rpc/handler.rs

Fix compile error

3947bf6

Add doc comment and rename

933dc00

Extract a function that calculates tau and t from the quota

b62537f

unwrap

86cf8fb

Remove unused limiter

8fd37c5

Restrict more than two requests from running simultaneously on the sa…

6c1015e

…me protocol per peer

Rename from self_limiter to outbound_request_limiter

817ce97

Fix clippy errors

7e42568

Merge branch 'refs/heads/unstable' into delayed-rpc-response

94c2493

# Conflicts: # beacon_node/lighthouse_network/tests/rpc_tests.rs

ackintosh removed the skip-ci Don't run the `test-suite` label Jul 1, 2024

ackintosh added 6 commits July 1, 2024 23:01

Fix import

9ad4eb7

Fix clippy errors

de9d943

Merge branch 'refs/heads/unstable' into delayed-rpc-response

7adb142

Merge branch 'refs/heads/unstable' into delayed-rpc-response

627fd33

# Conflicts: # beacon_node/lighthouse_network/tests/rpc_tests.rs

Update request_id with AppRequestId

b55ffca

Merge branch 'unstable' into delayed-rpc-response

73e9879

ackintosh marked this pull request as ready for review July 14, 2024 23:23

Move the sub match to avoid the unreachable

a9910e4

ackintosh force-pushed the delayed-rpc-response branch from 9d8e078 to a44df54 Compare March 13, 2025 22:38

ackintosh added 6 commits March 18, 2025 07:31

Avoid adding requests to active_inbound_requests without limits

2a682ff

sigp#5923 (comment)

Add the request back to active requests if the response is Success …

026edac

…and requires stream termination.

Merge branch 'unstable' into delayed-rpc-response

0943766

# Conflicts: # beacon_node/lighthouse_network/src/rpc/handler.rs # beacon_node/lighthouse_network/src/rpc/mod.rs # beacon_node/lighthouse_network/src/rpc/self_limiter.rs

Fix tests to fit the latest changes

ffe45f3

cargo fmt

753e845

Fix rpc_tests to fit the latest changes

8a990a2

dapplion approved these changes Apr 8, 2025

View reviewed changes

jxs approved these changes Apr 8, 2025

View reviewed changes

beacon_node/lighthouse_network/src/rpc/mod.rs Outdated Show resolved Hide resolved

pawanjay176 approved these changes Apr 9, 2025

View reviewed changes

ackintosh added 2 commits April 10, 2025 07:12

Change the metric names to conform to the limiter names

11468d8

Merge branch 'unstable' into delayed-rpc-response

d2da089

pawanjay176 approved these changes Apr 24, 2025

View reviewed changes

pawanjay176 added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Apr 24, 2025

mergify bot added a commit that referenced this pull request Apr 24, 2025

Merge of #5923

0064cdb

mergify bot mentioned this pull request Apr 24, 2025

merge queue: embarking unstable (402a81c) and #5923 together #7356

Closed

6 tasks

mergify bot merged commit 1324d3d into sigp:unstable Apr 24, 2025
31 checks passed

ackintosh deleted the delayed-rpc-response branch April 25, 2025 21:05

jimmygchen mentioned this pull request Apr 30, 2025

Delayed RPC Send Using Tokens #5785

Closed

ackintosh mentioned this pull request May 5, 2025

Add metrics for request/response idle time sigp/lighthouse-metrics#56

Open

dapplion reviewed May 15, 2025

View reviewed changes

dapplion mentioned this pull request May 19, 2025

RPC TTFB On RPC Response #5769

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Delayed RPC Send Using Tokens #5923

Delayed RPC Send Using Tokens #5923

Uh oh!

ackintosh commented Jun 13, 2024 •

edited

Loading

Uh oh!

ackintosh commented Mar 12, 2025 •

edited

Loading

Uh oh!

AgeManning commented Apr 7, 2025

Uh oh!

dapplion commented Apr 7, 2025

Uh oh!

pawanjay176 commented Apr 7, 2025

Uh oh!

dapplion left a comment

Uh oh!

jxs left a comment

Uh oh!

Uh oh!

pawanjay176 left a comment

Uh oh!

pawanjay176 Apr 7, 2025

Uh oh!

ackintosh Apr 9, 2025

Uh oh!

pawanjay176 Apr 7, 2025

Uh oh!

ackintosh commented Apr 23, 2025

Uh oh!

pawanjay176 left a comment

Uh oh!

Uh oh!

dapplion May 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Delayed RPC Send Using Tokens #5923

Delayed RPC Send Using Tokens #5923

Uh oh!

Conversation

ackintosh commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue Addressed

Proposed Changes

Is there already an active request with the same protocol?

Rate limit reached? and Wait until tokens are regenerated

Additional Info

Naming

Testing

Metrics

Uh oh!

ackintosh commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AgeManning commented Apr 7, 2025

Uh oh!

dapplion commented Apr 7, 2025

Uh oh!

pawanjay176 commented Apr 7, 2025

Uh oh!

dapplion left a comment

Choose a reason for hiding this comment

Uh oh!

jxs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pawanjay176 left a comment

Choose a reason for hiding this comment

Uh oh!

pawanjay176 Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

ackintosh Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

pawanjay176 Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

ackintosh commented Apr 23, 2025

Uh oh!

pawanjay176 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dapplion May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ackintosh commented Jun 13, 2024 •

edited

Loading

`Is there already an active request with the same protocol?`

`Rate limit reached?` and `Wait until tokens are regenerated`

ackintosh commented Mar 12, 2025 •

edited

Loading

dapplion May 15, 2025 •

edited

Loading