-
Notifications
You must be signed in to change notification settings - Fork 820
Description
Is your feature request related to a problem? Please describe.
Our test environment fails to get going, we think it's because sometimes all the retries happen in quick succession. We have various scripts that orchestrate multiple server components. There are dependencies between these components and sometimes it takes a few seconds for them to all come online, sometimes longer because reasons.
Describe the solution you'd like
I'd like to extend the retry policy with a minimum delay between retries.
Describe alternatives you've considered
Write my own gRPC stack...
I looked into the retry code, and there's nothing I can do from my end.
Additional context
It might have been a mistake to not provide a minimum delay between retries because now there's no way to ensure that at least X seconds is spent waiting for services to come online. The way this is described elsewhere makes me think this might have been an oversight?
I ran some experiments with the default retry policy documented here.
Basically, there's about 1/100 chance that all retries happen within 1 second (because there's no minimum).
I think there needs to be a minimum, it can default to zero, but at least this way we can ensure that there's a minimum period where it's waiting before giving up.
What we believe is happening in our test environment is that sometimes it takes a bit longer for various server components to go online which leads to intermittent startup failures which is causing CI/CD to fail more or less randomly.