Automatic Provider Cache Dir works as such, but downloads too much #5228

FlorinAndrei · 2025-12-12T20:47:01Z

FlorinAndrei
Dec 12, 2025

Describe the bug

With Terragrunt v0.95.0 and OpenTofu 1.11.1, I've disabled the Provider Cache Server. Instead, I've allowed Terragrunt to use the Automatic Provider Cache Dir.

It looks like the Automatic Provider Cache Dir works as a cache, it symlinks files to the cache folder. However, if I run terragrunt run --all -- init -upgrade in a folder with lots of resources, this takes a very long time to complete.

The reason is that each tofu process actually downloads the provider files. I don't think they get overwritten in the cache, but every tofu process spawned begins to download provider files.

This is slow and wasteful of bandwidth.

Provider Cache Server did not have this issue.

Steps To Reproduce

Use a repo with lots of resources already created.

Upgrade the AWS provider version.

Run terragrunt run --all -- init -upgrade at the top of the repo.

Expected behavior

The cache mechanism should only download the provider once, save it, and re-use it. Provider Cache Server works like this.

Must haves

Terragrunt v0.95.0
OpenTofu 1.11.1
A substantial Terragrunt code base with lots of resources being created.

Nice to haves

Terminal output
Screenshots

Versions

Terragrunt version: v0.95.0
OpenTofu/Terraform version: 1.11.1
Environment details (Ubuntu 20.04, Windows 10, etc.): macOS 15.6.1

FlorinAndrei · 2025-12-12T21:02:07Z

FlorinAndrei
Dec 12, 2025
Author

I cannot use this feature. I have right now a whole forest of tofu processes, each waiting for its turn to download the AWS provider, only to discard it when done because the new version is already in the cache. That provider got downloaded hundreds of times already, for no good reason.

I think Automatic Provider Cache Dir right now is only acceptable with very small Terragrunt repos.

I'm going back to Provider Cache Server.

0 replies

FlorinAndrei · 2025-12-12T21:21:57Z

FlorinAndrei
Dec 12, 2025
Author

Wow, I went back to Provider Cache Server, and it's orders of magnitude faster, while working in a much bigger environment.

Automatic Provider Cache Dir is not usable in the real world. I would be sitting here all day if I tried to enable it in any environment that's not tiny.

0 replies

yhakbar · 2025-12-12T21:53:50Z

yhakbar
Dec 12, 2025
Maintainer

Hey @FlorinAndrei ,

Thanks for sharing that! This seems better served as a community discussion rather than a bug report, so I've converted it to one. Hope you don't mind.

We've left the Provider Cache Server available in Terragrunt because of the scaling issues that you've mentioned. It's very dependent on user setups (the number and kinds of providers they use, the number of units they manage and whether or not .terraform.lock.hcl files are tracked in Git all impact the performance of OpenTofu provider caching).

If you can construct simple reproducible examples where OpenTofu can be more efficient in its provider caching, we're open with sharing those findings with the OpenTofu team to see if we can get OpenTofu provider caching speeds improved. There are already optimizations planned, and you can expect it to be faster in the near future.

0 replies

FlorinAndrei · 2025-12-12T22:44:41Z

FlorinAndrei
Dec 12, 2025
Author

Yeah, there's information I did not provide initially:

We freeze the versions of all infrastructure tools we use: Terragrunt, OpenTofu, the major providers such as AWS. This prevents variability that has bothered us in the past. This also means upgrades are deliberate, and happen across large chunks of infrastructure (whole environments). In root.hcl I have this:

generate "versions_override" {
  path      = "versions_override.tf"
  if_exists = "overwrite_terragrunt"
  contents  = <<EOF
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "${local.aws_provider_version}"
    }
    # other providers
  }
}
EOF
}

We do not track lock files in repos.

Today, I've updated Terragrunt and OpenTofu to the latest versions. That went well. Then I changed the versions for the AWS provider and another provider, and switched from Provider Cache Server to Automatic Provider Cache Dir. That's when things blew up in my face.

A recursive Terragrunt init -upgrade run in a rather small environment launched a bunch of tofu processes. Each tofu process tried independently to upgrade the provider in the cache. The locking mechanism only allows one process at a time to do it. But then literally all subsequent tofu processes downloaded the providers from scratch, and took turns to wait for the lock to be released, only to figure out nothing needs to be done, because the cache is already up to date. The process was purely sequential.

I think they downloaded the providers many hundreds of times.

I stopped testing, and went back to Provider Cache Server. It was faster by orders of magnitude. I've completed the migration with Provider Cache Server enabled, and I will keep it enabled for the time being.

I have CICD runners that may need to run Terragrunt. They do not have caching enabled, so they would have to download the providers - and would likely run into the same problem with Automatic Provider Cache Dir. But with Provider Cache Server they work well.

Gruntwork has a document where the two caches are compared. I think that document should make it very clear you should not use the new cache if you're operating at scale and you do not allow provider versions to fluctuate freely. Regardless, CICD runners with no caching, or even just git clone the TG repo on your laptop and running it from scratch, would be slow.

Something needs to change in the new cache to prevent every single tofu process from trying to download providers from scratch, when they already exist in the cache.

For all these reasons, I think the old cache is a better idea overall. The cache folder is shared, and the terragrunt process is also shared. It makes sense to manage the cache in the shared resource.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Automatic Provider Cache Dir works as such, but downloads too much #5228

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Automatic Provider Cache Dir works as such, but downloads too much #5228

Uh oh!

FlorinAndrei Dec 12, 2025

Describe the bug

Steps To Reproduce

Expected behavior

Must haves

Nice to haves

Versions

Replies: 4 comments

Uh oh!

FlorinAndrei Dec 12, 2025 Author

Uh oh!

FlorinAndrei Dec 12, 2025 Author

Uh oh!

yhakbar Dec 12, 2025 Maintainer

Uh oh!

Uh oh!

FlorinAndrei Dec 12, 2025 Author

FlorinAndrei
Dec 12, 2025

FlorinAndrei
Dec 12, 2025
Author

FlorinAndrei
Dec 12, 2025
Author

yhakbar
Dec 12, 2025
Maintainer

FlorinAndrei
Dec 12, 2025
Author