Skip to content

ggcr: Intermittent failures due to heuristic non-HTTPS connections when accessing a registry FQDN ending in .local #2139

@maxb

Description

@maxb

Describe the bug

At work I operate in an environment where the primary domain name ends in .local. For some time I have been experiencing occasional intermittent failures retrieving images, which resulted in a cryptic error "stopped after 10 redirects" in a circumstance where no redirects ought to be involved.

Finally after stepping through simplified client code in a debugger, all became clear:

  • ggcr incorporates a heuristic that believes any registry domain name that ends with .local operates without https

  • When ggcr's code pings (GET /v2/) a registry it believes is a plain http registry, it first attempts an https connection, and if this has not concluded within 300ms, attempts a plain http connection. In my environment, a sudden burst of connections from multiple hosts occasionally results in some of the https requests not being answered promptly, and the http fallback path being taken.

  • My environment incorporates an HTTP redirect from http to https for the benefit of browser clients

  • Although ggcr is willing to follow a redirect during the GET /v2/ request, if it followed a http -> https redirect, it somehow ends up overriding any https URL scheme specified by the server for its token endpoint to plain http, and ends up making token requests to the incorrect URL. (In my environment, these trigger additional http -> https redirects, and these ones are not followed by the ggcr code.)

To Reproduce

Somewhat complex to reproduce in full, you would need:

  • A registry serving HTTPS on a hostname that ends with .local
  • also operating an http to https redirect on that name, in the manner commonly done when supporting browser clients
  • to test fetching a manifest from the registry, generating sufficient load that sometimes the https GET /v2/ ping ends up not returning a result within 300ms, such that the plain http one would be attempted

Expected behavior

I personally feel triggering special behaviour based on a .local suffix to the domain name is "too magical".

If that is kept as a default, I'd prefer there be a way to switch it off, and have ggcr just adhere to whether it is told up front whether a registry is insecure or not.

Additional context

  • Version of the module - v0.20.2, but it doesn't look like there have been changes to the code paths discussed here in subsequent commits
  • Registry used (e.g., GCR, ECR, Quay) - JFrog Artifactory behind an Apache HTTPD reverse proxy implementing the TLS

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions