Skip to content

Allow graceful termination of ObjectDiffusion from the client side -> make sure it regains agency frequently enough #187

@tbagrel1

Description

@tbagrel1

There is currently a couple of issues with the current design of ObjectDiffusion w.r.t. miniprotocol technical requirements, as identified in the meeting with the network team on 2025/12/17 (cc @coot @crocodile-dentist):

1. There is no way for the client side to terminate gracefully if it is blocked on waiting an answer to a blocking request for ID. Indeed, the server as agency in this situation, but is not allowed to answer with an empty list of objects. The only way is to wait for a "long timeout"
- But it is a "requirement" for a node to have a way to terminate gracefully its client-side miniprotocols at will within ~30s, which would suggest "long timeout < 30s"
- But due to the parametrization of Peras, the duration of a round, etc, for certificate diffusion, with Peras in its "happy path", the server may not have any new object available within 90s, which suggests "long timeout > 90s" (otherwise a blocking request will almost always fail with timeout)

The design proposed in #144 would not help here, since the client would still be blocking (in state StObjIdsMustReplay) on receiving an answer from the server for a long (~ 90s) span of time.

2. If a Peras cooldown happens, then the ObjectDiffusion miniprotocols (for votes and certs) will fail ungracefully at some point since the consecutive Peras cooldown duration is not really bound and could last several minutes/hours. It would be better, when a cooldown is detected, to have a way to terminate the peras miniprotocols gracefully (and re-initiate them later)

Potential solutions for 1:

S1-A. Make the server answer with "keep-alive" objects every X seconds, so that we can use a relatively short timeout even for blocking requests. But this would consume extra network bandwith (this is my "naive" potential solution)

S1-B. Do not use a blocking ID request, but instead do a simple polling system based on non-blocking requests (pseudo blocking request = retry maximum $n$ times a non-blocking request, with a $smallDelay$ in between each, and if we still haven't received anything, sleep for $longDelay$ and repeat) (this has been proposed by the network team)

The goals of both solutions is to make sure the client regain agency frequently enough, as it is the only one who can trigger a graceful termination with MsgDone. Both solutions could also impact heavily the caught-up detection we want to implement in #144

Potential solutions for 2:
Easier for this one I think; at least in theory. The component in charge of deciding if we vote or not should have a way to inform the network layer that we need to terminate these protocols because of a cooldown. I'm not exactly sure how the restart would work though

In any case, this seems to require changes that depart significantly from the Peras design document. Typically, making a non-blocking request for IDs when we don't currently have unacknowledged ones is currently a protocol violation.

Note

This has been detected while trying to unblock IntersectMBO/ouroboros-network#5267, but it isn't clear if solving this issue will completely unblock IntersectMBO/ouroboros-network#5267

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions