Skip to content

Minimum timeout for retry policy #2673

@msab-john

Description

@msab-john

Is your feature request related to a problem? Please describe.

Our test environment fails to get going, we think it's because sometimes all the retries happen in quick succession. We have various scripts that orchestrate multiple server components. There are dependencies between these components and sometimes it takes a few seconds for them to all come online, sometimes longer because reasons.

Describe the solution you'd like

I'd like to extend the retry policy with a minimum delay between retries.

Describe alternatives you've considered

Write my own gRPC stack...

I looked into the retry code, and there's nothing I can do from my end.

Additional context

It might have been a mistake to not provide a minimum delay between retries because now there's no way to ensure that at least X seconds is spent waiting for services to come online. The way this is described elsewhere makes me think this might have been an oversight?

I ran some experiments with the default retry policy documented here.

Basically, there's about 1/100 chance that all retries happen within 1 second (because there's no minimum).

I think there needs to be a minimum, it can default to zero, but at least this way we can ensure that there's a minimum period where it's waiting before giving up.

What we believe is happening in our test environment is that sometimes it takes a bit longer for various server components to go online which leads to intermittent startup failures which is causing CI/CD to fail more or less randomly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions