`--experimental_remote_repo_contents_cache` #27509

fmeum · 2025-11-03T19:53:13Z

fmeum
Nov 3, 2025
Collaborator

Use this discussion to provide feedback on the new remote repository cache introduced in Bazel 9 as an experimental feature.

FAQs

How do I enable the remote repo contents cache?

Enable the --experimental_remote_repo_contents_cache startup flag.
Use any remote cache (both HTTP and gRPC are supported) via --remote_cache or --remote_executor.

Which repo rules are supported?

A repository rule has to return repository_ctx.repo_metadata(reproducible = True) to be eligible for caching.
As a further limitation of the current experimental implementation, only repo rules without any dependencies added at runtime (e.g., via repository_ctx.watch or .getenv) are supported. We hope to lift this restriction in the future. You can check a repo with canonical name @@<repo_name> for runtime deps by opening the file $(bazel info output_base)/external/@<repo_name>.marker - all lines after the first one are runtime deps.

keith · 2025-11-03T20:44:36Z

keith
Nov 3, 2025
Collaborator

As a starting point I have 2 initial questions:

do we need to worry about anything on the remote exec service side that needs to be added for this?
is there an easy way to spot out repository_rules that have the watch / getenv requirements that maybe don't need them so we could change them to be compatible with this?

1 reply

fmeum Nov 4, 2025
Collaborator Author

I started an FAQ list with answers to these questions.

malt3 · 2025-11-05T14:09:52Z

malt3
Nov 5, 2025

The commit message contains this:

Repositories are cached in a regular remote cache as AC entries

Do I interpret this correctly to mean that I need to allow Bazel to write to the remote AC directly?
I think many companies only allow the remote executor to set AC entries for the remote cache to avoid attackers poisoning the remote action cache.

Since repository rules still run locally (unless they are cached), this design makes a lot of sense.
Just trying to get confirmation.

5 replies

fmeum Nov 5, 2025
Collaborator Author

That is correct. My general assumption has been that you would allow CI machines to write AC entries and those would then also be able to populate the remote repo contents. Devs could then consume those just fine. Are you really referring to a setup in which Bazel itself never writes AC entries but remote executors do so directly?

malt3 Nov 5, 2025

Are you really referring to a setup in which Bazel itself never writes AC entries but remote executors do so directly?

Yes that's precisely correct. I believe BuildBuddy and BuildBarn support this configuration.

malt3 Nov 5, 2025

I think the feature is still great and I'll enable it in most environments, but I think it's important to point out what it entails.

fmeum Nov 5, 2025
Collaborator Author

This would be difficult to support since it's completely unclear how to remotely execute (even a reproducible) repo rule - this would essentially require turning Bazel into a standalone Starlark interpreter of sorts. I guess at that point it would be easier to set up a trusted "repo rule evaluation" machine running regular Bazel?

malt3 Nov 5, 2025

Yes, sorry. I think we are talking about different things.
I was merely saying that it's a common setup to prevent writes to the AC from Bazel.
I agree that it's extremely hard to work around that limitation.

sluongng · 2025-11-17T13:32:54Z

sluongng
Nov 17, 2025

I discussed this with @Wyverald at BazelCon:

This feature is really good at helping CI operators reduce the typical size of their CI environment. By not having to materialize the external/ dir on disk and delegating most of the builds to the RBE server, the CI workspace can be shrunk down significantly.

The current implementation relies on an Action Cache(AC) write to the remote cache. AC in most enterprise setups is heavily gated and can only be written from a trusted CI worker/pipeline. This is recommended since 2018 through this talk https://youtu.be/5a0ENnZivo0?t=1041

When a user were to introduce a new external dependency, or upgrade a current one, the trusted CI worker machine will have to (a) download the new external deps and (b) calculate the new AC entry to update the remote cache accordingly. This process may require significant disk space to store the downloaded data.

Because of this, CI server operators, like myself, will see a typical cached CI workspace using very little disk space, but occasionally, there can be workspaces with 10-50x the disk size. This makes it really hard for us to schedule resources for these CI environments. In addition to this, our CI offering (BuildBuddy Workflows) utilizes a VM snapshot system, which may retain the disk content after each build. The bigger the VM, the harder it is to do the snapshot and distribute it among multiple workers.

For this reason, I think the current setup can be improved:

a. Providing separate tools to manually create and populate the AC entries without Bazel. Dependency upgrades / newly added can then go through a separate CI step that would prepopulate the AC entry. This means that all Bazel Workspaces in CI will get a cache hit and expect no disk inflation.

b. Allowing the reproducible repo rules to run remotely using RBE. This would let us offload the downloads and preparation steps to the remote worker. All downloads are expected to be verified by a checksum to ensure reproducibility.

0 replies

sluongng · 2025-11-25T08:26:46Z

sluongng
Nov 25, 2025

I could be wrong, but the request metadata seems to be a little bit off from the usual convention:

https://cs.opensource.google/bazel/bazel/+/60bc017bca7dcdc6662061d591b544abac8d6107:src/main/java/com/google/devtools/build/lib/remote/RemoteRepoContentsCacheImpl.java;l=269-270

I think the repo name should be placed in the label and the action id should be a fixed constant (i.e. repository_content)

1 reply

fmeum Nov 25, 2025
Collaborator Author

You are right. Passing in anything but an action ID would probably require a bit of refactoring work to avoid having to synthesize an ActionAnalysisMetadata object. But it would be the correct approach. The mnemonic could also be set to the repo rule name.

--experimental_remote_repo_contents_cache #27509

Uh oh!

Uh oh!

fmeum Nov 3, 2025 Collaborator

FAQs

How do I enable the remote repo contents cache?

Which repo rules are supported?

Replies: 4 comments · 7 replies

Uh oh!

keith Nov 3, 2025 Collaborator

Uh oh!

fmeum Nov 4, 2025 Collaborator Author

Uh oh!

malt3 Nov 5, 2025

Uh oh!

fmeum Nov 5, 2025 Collaborator Author

Uh oh!

malt3 Nov 5, 2025

Uh oh!

malt3 Nov 5, 2025

Uh oh!

fmeum Nov 5, 2025 Collaborator Author

Uh oh!

malt3 Nov 5, 2025

Uh oh!

sluongng Nov 17, 2025

Uh oh!

sluongng Nov 25, 2025

Uh oh!

fmeum Nov 25, 2025 Collaborator Author

`--experimental_remote_repo_contents_cache` #27509

fmeum
Nov 3, 2025
Collaborator

Replies: 4 comments 7 replies

keith
Nov 3, 2025
Collaborator

fmeum Nov 4, 2025
Collaborator Author

malt3
Nov 5, 2025

fmeum Nov 5, 2025
Collaborator Author

fmeum Nov 5, 2025
Collaborator Author

sluongng
Nov 17, 2025

sluongng
Nov 25, 2025

fmeum Nov 25, 2025
Collaborator Author