-
Notifications
You must be signed in to change notification settings - Fork 0
Description
There is currently a couple of issues with the current design of ObjectDiffusion w.r.t. miniprotocol technical requirements, as identified in the meeting with the network team on 2025/12/17 (cc @coot @crocodile-dentist):
1. There is no way for the client side to terminate gracefully if it is blocked on waiting an answer to a blocking request for ID. Indeed, the server as agency in this situation, but is not allowed to answer with an empty list of objects. The only way is to wait for a "long timeout"
- But it is a "requirement" for a node to have a way to terminate gracefully its client-side miniprotocols at will within ~30s, which would suggest "long timeout < 30s"
- But due to the parametrization of Peras, the duration of a round, etc, for certificate diffusion, with Peras in its "happy path", the server may not have any new object available within 90s, which suggests "long timeout > 90s" (otherwise a blocking request will almost always fail with timeout)
The design proposed in #144 would not help here, since the client would still be blocking (in state StObjIdsMustReplay) on receiving an answer from the server for a long (~ 90s) span of time.
2. If a Peras cooldown happens, then the ObjectDiffusion miniprotocols (for votes and certs) will fail ungracefully at some point since the consecutive Peras cooldown duration is not really bound and could last several minutes/hours. It would be better, when a cooldown is detected, to have a way to terminate the peras miniprotocols gracefully (and re-initiate them later)
Potential solutions for 1:
S1-A. Make the server answer with "keep-alive" objects every X seconds, so that we can use a relatively short timeout even for blocking requests. But this would consume extra network bandwith (this is my "naive" potential solution)
S1-B. Do not use a blocking ID request, but instead do a simple polling system based on non-blocking requests (pseudo blocking request = retry maximum
The goals of both solutions is to make sure the client regain agency frequently enough, as it is the only one who can trigger a graceful termination with MsgDone. Both solutions could also impact heavily the caught-up detection we want to implement in #144
Potential solutions for 2:
Easier for this one I think; at least in theory. The component in charge of deciding if we vote or not should have a way to inform the network layer that we need to terminate these protocols because of a cooldown. I'm not exactly sure how the restart would work though
In any case, this seems to require changes that depart significantly from the Peras design document. Typically, making a non-blocking request for IDs when we don't currently have unacknowledged ones is currently a protocol violation.
Note
This has been detected while trying to unblock IntersectMBO/ouroboros-network#5267, but it isn't clear if solving this issue will completely unblock IntersectMBO/ouroboros-network#5267