-
Notifications
You must be signed in to change notification settings - Fork 870
Delayed RPC Send Using Tokens #5923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
29e3f00
to
90361d6
Compare
90361d6
to
7e0c630
Compare
# Conflicts: # beacon_node/lighthouse_network/src/rpc/handler.rs
…me protocol per peer
# Conflicts: # beacon_node/lighthouse_network/tests/rpc_tests.rs
# Conflicts: # beacon_node/lighthouse_network/tests/rpc_tests.rs
@jxs @dapplion @pawanjay176 @jimmygchen @AgeManning In this PR, the rate limiters have been updated in terms of naming and functionality according to the spec update. Here is a table summarizing the changes to the limiters. The current naming might not be ideal, so I hope this will be helpful for your feedback. 🙏
Additionally, a global rate limiter for inbound requests has been proposed to reduce memory cost in worst-case senarios. |
…ponse # Conflicts: # beacon_node/lighthouse_network/src/rpc/handler.rs # beacon_node/lighthouse_network/src/rpc/mod.rs # beacon_node/lighthouse_network/src/rpc/self_limiter.rs # beacon_node/lighthouse_network/tests/common.rs # beacon_node/lighthouse_network/tests/rpc_tests.rs
9d8e078
to
a44df54
Compare
…and requires stream termination.
# Conflicts: # beacon_node/lighthouse_network/src/rpc/handler.rs # beacon_node/lighthouse_network/src/rpc/mod.rs # beacon_node/lighthouse_network/src/rpc/self_limiter.rs
Ok, in the interest of not getting this thing stale and requiring us to maintain it, how do we all feel about a merge? |
I would like to have this in unstable soon, and work PeerDAS around this rate limit paradigm |
I'm in favour of getting this in as well. Will test this for a bit and report back |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving minus testing by @pawanjay176
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patience Ahikito, this looks reasonably good to me to be landed and tested!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me as well. Just minor nits.
Been running this for sometime on mainnet with hammering a synced node for requests and it holds up as expected.
Great work on this one, sorry it took so long for review and testing.
@@ -206,6 +206,20 @@ pub static REPORT_PEER_MSGS: LazyLock<Result<IntCounterVec>> = LazyLock::new(|| | |||
) | |||
}); | |||
|
|||
pub static SELF_LIMITER_REQUEST_IDLING: LazyLock<Result<Histogram>> = LazyLock::new(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that if we are changing the naming as mentioned in #5923 (comment) , then we should probably change the metric names here too for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I have pushed a commit to update the metric names, but the commit is not showing up in this PR. I'm not sure why; maybe waiting a bit will resolve this.
}); | ||
|
||
pub static RESPONSE_LIMITER_RESPONSE_IDLING: LazyLock<Result<Histogram>> = LazyLock::new(|| { | ||
try_create_histogram( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
@pawanjay176 Sorry for the delay.I've addressed your feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, let's get this in!
/// Requests queued for sending per peer. This requests are stored when the self rate | ||
/// Active requests that are awaiting a response. | ||
active_requests: HashMap<PeerId, HashMap<Protocol, usize>>, | ||
/// Requests queued for sending per peer. These requests are stored when the self rate | ||
/// limiter rejects them. Rate limiting is based on a Peer and Protocol basis, therefore | ||
/// are stored in the same way. | ||
delayed_requests: HashMap<(PeerId, Protocol), VecDeque<QueuedRequest<Id, E>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to have these additional metrics to debug PeerDAS
- Counter: Total count of queue requests labeled by
Protocol
string- Same for rate_limiter delayed_responses
- Histogram:
VecDeque
length at the moment of queuing a request, labeled byProtocol
string = observe length before pushing- Same for rate_limiter delayed_responses
- Gauge: Current sum of
VecDeque
lengths for all items indelayed_requests
Issue Addressed
closes #5785
Proposed Changes
The diagram below shows the differences in how the receiver (responder) behaves before and after this PR. The following sentences will detail the changes.
Is there already an active request with the same protocol?
This check is not performed in
Before
. This is taken from the PR in the consensus-spec, which proposes updates regarding rate limiting and response timeout.https://github.com/ethereum/consensus-specs/pull/3767/files
The PR mentions the requester side. In this PR, I introduced the
ActiveRequestsLimiter
for theresponder
side to restrict more than two requests from running simultaneously on the same protocol per peer. If the limiter disallows a request, the responder sends a rate-limited error and penalizes the requester.Rate limit reached?
andWait until tokens are regenerated
UPDATE: I moved the limiter logic to the behaviour side. #5923 (comment)
The rate limiter is shared between the behaviour and the handler. (Arc<Mutex<RateLimiter>>>
) The handler checks the rate limit and queues the response if the limit is reached. The behaviour handles pruning.I considered not sharing the rate limiter between the behaviour and the handler, and performing all of these either within the behaviour or handler. However, I decided against this for the following reasons:Regarding performing everything within the behaviour: The behaviour is unable to recognize the response protocol whenRPC::send_response()
is called, especially when the response isRPCCodedResponse::Error
. Therefore, the behaviour can't rate limit responses based on the response protocol.Regarding performing everything within the handler: When multiple connections are established with a peer, there could be multiple handlers interacting with that peer. Thus, we cannot enforce rate limiting per peer solely within the handler. (Any ideas? 🤔 )Additional Info
Naming
I have renamed the fields of the behaviour to make them more intuitive: #5923 (comment)
Testing
I have run beacon node with this changes for 24hours, it looks work fine.
The rate-limited error has not occurred anymore while running this branch.
Metrics
These metrics have been added in this PR.
self_limiter_request_idling_seconds
: The time our own request remained idle in the self-limiter.response_limiter_idling_seconds
: The time our response remained idle in the response limiter.