Retry requests to rekor, fulcio ?

I've been keeping an eye on intermittent failures in various CI workflows that run sigstore tools (root-signing, root-signing-staging, sigstore-probers)... and my gut feeling is that sigstore-python fails a little more often than the other clients.

Looking at the client implementations, at least cosign, sigstore-java and sigstore-js seem to have some built-in retries for the requests they make to rekor and fulcio. I wonder if we should have that too?

It's not an obvious decision:
* Interactive use and CI use have different expectations WRT responsiveness -- maybe we should only retry on CI?
* It's not entirely trivial to recognize which responses should lead to retries -- potential ones could be 5xx, 429
* by far most failures seem to be on rekor
* Some error responses have `Retry-After` header but most do not seem to (I can't see the actual responses in the load balancer logs so this is partly guesswork but I believe 503 and 429 include the header). 429 specifically only makes sense to retry with the Retry-After value


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry requests to rekor, fulcio ? #1148

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development