Skip to content

Retry requests to rekor, fulcio ? #1148

Open
@jku

Description

I've been keeping an eye on intermittent failures in various CI workflows that run sigstore tools (root-signing, root-signing-staging, sigstore-probers)... and my gut feeling is that sigstore-python fails a little more often than the other clients.

Looking at the client implementations, at least cosign, sigstore-java and sigstore-js seem to have some built-in retries for the requests they make to rekor and fulcio. I wonder if we should have that too?

It's not an obvious decision:

  • Interactive use and CI use have different expectations WRT responsiveness -- maybe we should only retry on CI?
  • It's not entirely trivial to recognize which responses should lead to retries -- potential ones could be 5xx, 429
  • by far most failures seem to be on rekor
  • Some error responses have Retry-After header but most do not seem to (I can't see the actual responses in the load balancer logs so this is partly guesswork but I believe 503 and 429 include the header). 429 specifically only makes sense to retry with the Retry-After value

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions