Skip to content

Repo Auth and URL Redirects Not Compatible #25068

Open
@CauhxMilloy

Description

Description of the bug:

A repo dep (e.g. http_archive() or similar) supports auth (e.g. via NETRC, auth_patterns, etc). URLs given (e.g. via url, urls, etc) can also support redirects. However, these two mechanisms do not properly function together.

This is caused due to the APIs in use_netrc() and download_and_extract() (in bzl) and due to the implementation for auth headers in the Bazel runtime. Namely, data is not passed or processed based on domains (as would be intuited given the auth_patterns dict). Instead, the exact URLs are used to determine which headers to apply.

This data transformation (from domain to exact URLs) happens in util.bzl (in use_netrc()). This transformed data (map of exact URL -> pattern) is expected by the auth parameter for ctx.download_and_extract() (e.g. as used in http_archive()).

While redirects are supported in HttpConnector, the headers are applied based on the exact URL (given that is what the dict/map keys are from the input data); see here, here, here, and here. This means that a redirected URL will not get auth headers added when connecting to the new location, likely leading to a 404 (or similar).

This seems due to being tied to the com.google.auth.Credentials API, where getRequestMetadata() takes in the entire URI.

An example repro setup is to use something like:

http_archive(
    name = "some_dep_repo",
    urls = [
        "https://example.com/url/which/returns/302/to/actual/url/file.tar.gz",
    ],
    auth_patterns = {
        "example.com": "Bearer <password>",
    },
)

This results in a 404 (or similar, depending on host site), due to auth headers missing after following redirect. This obviously prevents fetching functionality -- but it's also really confusing, especially as debugging with curl would work as expected.

Technically speaking, there is a work-around where the redirected URL can also be explicitly listed in urls -- but that ignores the whole point of a redirect URL. This would look like:

http_archive(
    name = "some_dep_repo",
    urls = [
        "https://example.com/url/which/returns/302/to/actual/url/file.tar.gz",
        "https://example.com/some/url/af123de/for/actual/file",
    ],
    auth_patterns = {
        "example.com": "Bearer <password>",
    },
)

This work-around ensures that, when https://example.com/url/which/returns/302/to/actual/url/file.tar.gz 302-redirects to https://example.com/some/url/af123de/for/actual/file, the auth headers get applied because the URL is found and mapped as necessary. But, as mentioned, this explicit listing is kinda silly.

I was also able to see that the auth headers were missing with Wireshark. Using curl with export SSLKEYLOGFILE="${PWD}/sslkeylog.log" and using Bazel with "--host_jvm_args=-javaagent:${PWD}/extract-tls-secrets-4.0.0.jar=${PWD}/sslkeylog.log" (see https://github.com/neykov/extract-tls-secrets).

Seeing as this is pretty engrained into the APIs for use_netrc() and download_and_extract(), etc, it's not clear how this could best be addressed in a backwards-compat way. Perhaps simply having a flag (default false) to fallback to checking only the domain..? I figured this should at least be documented..

Which category does this issue belong to?

Core, Rules API, Configurability

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

As mentioned in the bug description, creating a http_archive() which points to a redirecting URL that requires auth.

http_archive(
    name = "some_dep_repo",
    urls = [
        "https://example.com/url/which/returns/302/to/actual/url/file.tar.gz",
    ],
    auth_patterns = {
        "example.com": "Bearer <password>",
    },
)

Then running something like bazel fetch @some_dep_repo//... will result in:

WARNING: Download from https://example.com/url/which/returns/302/to/actual/url/file.tar.gz failed: class java.io.FileNotFoundException GET returned 404 Not Found
ERROR: An error occurred during the fetch of repository 'some_dep_repo':
   Traceback (most recent call last):
        File "/root/.cache/bazel/_bazel_root/aaabbb123123/external/bazel_tools/tools/build_defs/repo/http.bzl", line 132, column 45, in _http_archive_impl
                download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: Error downloading [https://example.com/url/which/returns/302/to/actual/url/file.tar.gz] to /root/.cache/bazel/_bazel_root/aaabbb123123/external/some_dep_repo/temp1111111222222/file.tar.gz: GET returned 404 Not Found
ERROR: /root/some/path/to/my/workspace/WORKSPACE:17:13: fetching http_archive rule //external:some_dep_repo: Traceback (most recent call last):
        File "/root/.cache/bazel/_bazel_root/aaabbb123123/external/bazel_tools/tools/build_defs/repo/http.bzl", line 132, column 45, in _http_archive_impl
                download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: Error downloading [https://example.com/url/which/returns/302/to/actual/url/file.tar.gz] to /root/.cache/bazel/_bazel_root/aaabbb123123/external/some_dep_repo/temp1111111222222/file.tar.gz: GET returned 404 Not Found

I happened to find this bug when using Gitlab's Release links API redirecting to the Markdown uploads API. But, this isn't a Gitlab issue. It is a HTTP redirect + auth in Bazel issue.

My use case happens to focus on http_archive() (as shown in these examples), but this really affects all download calls with auth.

I tested this with Bazel 6 and Bazel 7.

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

tested with release 6.5.0-0 and release 7.4.1-0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

N/A

What's the output of git remote get-url origin; git rev-parse HEAD ?

N/A

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

Looking through Bazel's git history, this seems like this has always been an issue (so long as auth and redirects have been supported for remote repo fetching).

Have you found anything relevant by searching the web?

https://bazel.build/rules/lib/repo/http#http_archive mentions that "Redirections are followed." for url/urls. (which it does, but not much mention of how it interacts with auth_patterns).

Similar bugs/PRs (but not the same problem):
#14866
#14922

I also obviously found all the various code pointers linked above for how auth data is piped/processed/applied for HTTP calls.

Any other information, logs, or outputs that you want to share?

I don't think there's anything else.. 😅 Feel free to ask or let me know, if there is.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    P2We'll consider working on this in future. (Assignee optional)team-ExternalDepsExternal dependency handling, remote repositiories, WORKSPACE file.type: bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions