Skip to content

Delayed RPC Send Using Tokens #5923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 90 commits into from
Apr 24, 2025
Merged

Conversation

ackintosh
Copy link
Member

@ackintosh ackintosh commented Jun 13, 2024

Issue Addressed

closes #5785

Proposed Changes

The diagram below shows the differences in how the receiver (responder) behaves before and after this PR. The following sentences will detail the changes.

flowchart TD

subgraph "*** After ***"
    Start2([START]) --> AA[Receive request]
    AA --> COND1{Is there already an active request <br> with the same protocol?}
    COND1 --> |Yes| CC[Send error response]
    CC --> End2([END])
    %% COND1 --> |No| COND2{Request is too large?}
    %% COND2 --> |Yes| CC
    COND1 --> |No| DD[Process request]
    DD --> EE{Rate limit reached?}
    EE --> |Yes| FF[Wait until tokens are regenerated]
    FF --> EE
    EE --> |No| GG[Send response]
    GG --> End2
end

subgraph "*** Before ***"
    Start([START]) --> A[Receive request]
    A --> B{Rate limit reached <br> or <br> request is too large?}
    B -->|Yes| C[Send error response]
    C --> End([END])
    B -->|No| E[Process request]
    E --> F[Send response]
    F --> End
end
Loading

Is there already an active request with the same protocol?

This check is not performed in Before. This is taken from the PR in the consensus-spec, which proposes updates regarding rate limiting and response timeout.
https://github.com/ethereum/consensus-specs/pull/3767/files

The requester MUST NOT make more than two concurrent requests with the same ID.

The PR mentions the requester side. In this PR, I introduced the ActiveRequestsLimiter for the responder side to restrict more than two requests from running simultaneously on the same protocol per peer. If the limiter disallows a request, the responder sends a rate-limited error and penalizes the requester.

Rate limit reached? and Wait until tokens are regenerated

UPDATE: I moved the limiter logic to the behaviour side. #5923 (comment)

The rate limiter is shared between the behaviour and the handler. (Arc<Mutex<RateLimiter>>>) The handler checks the rate limit and queues the response if the limit is reached. The behaviour handles pruning.

I considered not sharing the rate limiter between the behaviour and the handler, and performing all of these either within the behaviour or handler. However, I decided against this for the following reasons:

  • Regarding performing everything within the behaviour: The behaviour is unable to recognize the response protocol when RPC::send_response() is called, especially when the response is RPCCodedResponse::Error. Therefore, the behaviour can't rate limit responses based on the response protocol.
  • Regarding performing everything within the handler: When multiple connections are established with a peer, there could be multiple handlers interacting with that peer. Thus, we cannot enforce rate limiting per peer solely within the handler. (Any ideas? 🤔 )

Additional Info

Naming

I have renamed the fields of the behaviour to make them more intuitive: #5923 (comment)

Testing

I have run beacon node with this changes for 24hours, it looks work fine.

The rate-limited error has not occurred anymore while running this branch.

image

Metrics

These metrics have been added in this PR.

  • self_limiter_request_idling_seconds: The time our own request remained idle in the self-limiter.
  • response_limiter_idling_seconds: The time our response remained idle in the response limiter.

@ackintosh ackintosh added work-in-progress PR is a work-in-progress Networking skip-ci Don't run the `test-suite` labels Jun 13, 2024
@ackintosh ackintosh force-pushed the delayed-rpc-response branch from 29e3f00 to 90361d6 Compare June 19, 2024 22:04
@ackintosh ackintosh force-pushed the delayed-rpc-response branch from 90361d6 to 7e0c630 Compare June 19, 2024 22:44
@ackintosh ackintosh removed the skip-ci Don't run the `test-suite` label Jul 1, 2024
@ackintosh ackintosh marked this pull request as ready for review July 14, 2024 23:23
@ackintosh
Copy link
Member Author

ackintosh commented Mar 12, 2025

@jxs @dapplion @pawanjay176 @jimmygchen @AgeManning

In this PR, the rate limiters have been updated in terms of naming and functionality according to the spec update. Here is a table summarizing the changes to the limiters. The current naming might not be ideal, so I hope this will be helpful for your feedback. 🙏

Previous Naming Current Naming What This Does
limiter: Option<RateLimiter> response_limiter: Option<ResponseLimiter<E>> Previously, this rate-limited inbound requests.
Currently, it rate-limits our responses to inbound requests.
self_limiter: Option<SelfRateLimiter<Id, E>> outbound_request_limiter: SelfRateLimiter<Id, E> Rate-limits our outbound requests.
Previously, it was optional.

Additionally, a global rate limiter for inbound requests has been proposed to reduce memory cost in worst-case senarios.
( #5923 (comment) )

…ponse

# Conflicts:
#	beacon_node/lighthouse_network/src/rpc/handler.rs
#	beacon_node/lighthouse_network/src/rpc/mod.rs
#	beacon_node/lighthouse_network/src/rpc/self_limiter.rs
#	beacon_node/lighthouse_network/tests/common.rs
#	beacon_node/lighthouse_network/tests/rpc_tests.rs
@ackintosh ackintosh force-pushed the delayed-rpc-response branch from 9d8e078 to a44df54 Compare March 13, 2025 22:38
# Conflicts:
#	beacon_node/lighthouse_network/src/rpc/handler.rs
#	beacon_node/lighthouse_network/src/rpc/mod.rs
#	beacon_node/lighthouse_network/src/rpc/self_limiter.rs
@AgeManning
Copy link
Member

Ok, in the interest of not getting this thing stale and requiring us to maintain it, how do we all feel about a merge?
@jxs @pawanjay176 @dapplion ?

@dapplion
Copy link
Collaborator

dapplion commented Apr 7, 2025

I would like to have this in unstable soon, and work PeerDAS around this rate limit paradigm

@pawanjay176
Copy link
Member

I'm in favour of getting this in as well. Will test this for a bit and report back

Copy link
Collaborator

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving minus testing by @pawanjay176

Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patience Ahikito, this looks reasonably good to me to be landed and tested!

Copy link
Member

@pawanjay176 pawanjay176 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me as well. Just minor nits.
Been running this for sometime on mainnet with hammering a synced node for requests and it holds up as expected.
Great work on this one, sorry it took so long for review and testing.

@@ -206,6 +206,20 @@ pub static REPORT_PEER_MSGS: LazyLock<Result<IntCounterVec>> = LazyLock::new(||
)
});

pub static SELF_LIMITER_REQUEST_IDLING: LazyLock<Result<Histogram>> = LazyLock::new(|| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that if we are changing the naming as mentioned in #5923 (comment) , then we should probably change the metric names here too for consistency

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I have pushed a commit to update the metric names, but the commit is not showing up in this PR. I'm not sure why; maybe waiting a bit will resolve this.

});

pub static RESPONSE_LIMITER_RESPONSE_IDLING: LazyLock<Result<Histogram>> = LazyLock::new(|| {
try_create_histogram(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@ackintosh
Copy link
Member Author

@pawanjay176 Sorry for the delay.I've addressed your feedback.

Copy link
Member

@pawanjay176 pawanjay176 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, let's get this in!

@pawanjay176 pawanjay176 added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Apr 24, 2025
mergify bot added a commit that referenced this pull request Apr 24, 2025
@mergify mergify bot merged commit 1324d3d into sigp:unstable Apr 24, 2025
31 checks passed
@ackintosh ackintosh deleted the delayed-rpc-response branch April 25, 2025 21:05
/// Requests queued for sending per peer. This requests are stored when the self rate
/// Active requests that are awaiting a response.
active_requests: HashMap<PeerId, HashMap<Protocol, usize>>,
/// Requests queued for sending per peer. These requests are stored when the self rate
/// limiter rejects them. Rate limiting is based on a Peer and Protocol basis, therefore
/// are stored in the same way.
delayed_requests: HashMap<(PeerId, Protocol), VecDeque<QueuedRequest<Id, E>>>,
Copy link
Collaborator

@dapplion dapplion May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to have these additional metrics to debug PeerDAS

  • Counter: Total count of queue requests labeled by Protocol string
    • Same for rate_limiter delayed_responses
  • Histogram: VecDeque length at the moment of queuing a request, labeled by Protocol string = observe length before pushing
    • Same for rate_limiter delayed_responses
  • Gauge: Current sum of VecDeque lengths for all items in delayed_requests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Networking ready-for-merge This PR is ready to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants