Skip to content

PSS: safe provider download during init (interactive mode only)#38205

Open
SarahFrench wants to merge 13 commits intomainfrom
pss/safe-provider-download-during-init
Open

PSS: safe provider download during init (interactive mode only)#38205
SarahFrench wants to merge 13 commits intomainfrom
pss/safe-provider-download-during-init

Conversation

@SarahFrench
Copy link
Member

@SarahFrench SarahFrench commented Feb 24, 2026

Context & what the new security feature does

This PR fixes a security issue identified in the PSS project. Previously users had the opportunity to inspect providers downloaded in init before the providers binaries were executed and used during other downstream commands like plan/apply. With the introduction of PSS we created a scenario where a provider can be downloaded and immediately used without users having an opportunity to vet them.

Now the init command will intervene when it sees Terraform is going to download an 'unsafe' provider for pluggable state storage.

What is unsafe? Terraform regards any provider downloaded over HTTP to be unsafe until the user has approved it. Providers that are sourced from the local machine are treated as safe and pre-vetted, as the user has curated that collection of providers themselves and explicitly directed Terraform to use them via CLI configuration (see use of filesystem_mirror in the docs).

How does Terraform intervene? If Terraform detects that the state storage provider will be downloaded via HTTP and the user hasn't opted into a 'safe init' process then the CLI will return an error prompting the user to re-run the init command with a flag (-safe-init) that opts them into the new security features. The intention of this flag it to ensure users navigate the potential security issue in an engaged way.

When would Terraform intervene? If the provider is downloaded via HTTP then Terraform will intervene when the state storage provider is first installed or upgraded (via -upgrade flag).

When would Terraform not intervene? If the state storage provider is being downloaded for the first time or upgraded but the provider is sourced from a non-HTTP source then Terraform will not force the user to supply the -safe-init flag.

This PR

This work was created after doing some prototyping of different approaches in:

The approach in this PR is based on the last branch.

This PR implements security features by editing the getProvidersFromConfig function, that's only used in the experimental version of init in the PSS experiment. This function determines what providers to download (based on preexisting locks, config, and upgrade flag value) and returns new 'config locks' to the init command implementation. The function now also determines what action Terraform should take regarding PSS and the new security features, and returns a value to the calling code instructing it what to do. By implementing this logic inside the getProvidersFromConfig function we can access the location metadata for providers that are downloaded within a given init command and this location data helps to decide if the user needs to be prompted or not.

This PR only implements the features for when Terraform is run in an interactive mode. A follow-up PR will handle security features for TF in automation.

Related changes

This PR was stacked on top of:

#38206

#38212

Target Release

N/A

Rollback Plan

  • If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

CHANGELOG entry

  • This change is user-facing and I added a changelog entry.
  • This change is not user-facing.

@SarahFrench SarahFrench added the no-changelog-needed Add this to your PR if the change does not require a changelog entry label Feb 24, 2026
@SarahFrench
Copy link
Member Author

SarahFrench commented Feb 24, 2026

TODO:

  • Confirm flag name
  • Confirm other names in this code - e.g. ConfigReturningSource
  • Need some tests showing this feature being active during a provider upgrade step. This should be possible but downstream backend initialisation logic doesn't enable provider upgrades yet, so need to make that clear in the test.

@SarahFrench SarahFrench force-pushed the pss/safe-provider-download-during-init branch from 560e307 to b673d51 Compare February 25, 2026 18:12
@SarahFrench
Copy link
Member Author

Rebased this branch onto the branch in #38206

@SarahFrench SarahFrench force-pushed the pss/safe-provider-download-during-init branch from b673d51 to 2fcd5c4 Compare February 25, 2026 18:38
@SarahFrench SarahFrench force-pushed the pss/safe-provider-download-during-init branch 2 times, most recently from 24292a8 to f9e56cd Compare February 26, 2026 15:37
@SarahFrench
Copy link
Member Author

Rebased onto main after the 2 PRs dependencies were merged there

@SarahFrench SarahFrench changed the title PSS: safe provider download during init PSS: safe provider download during init (interactive mode only) Feb 26, 2026
@SarahFrench SarahFrench marked this pull request as ready for review February 27, 2026 15:30
@SarahFrench SarahFrench requested a review from a team as a code owner February 27, 2026 15:30
}
})

t.Run("upgrading the provider used for state storage via HTTP", func(t *testing.T) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewer: this shows that the new flag is needed for upgrade but, as discussed internally, there is still the bug where the same provider is used to create the source and destination state stores during state migration.

Copy link
Member

@radeksimko radeksimko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR implements security features by editing the getProvidersFromConfig function, that's only used in the experimental version of init in the PSS experiment.

Note that the second half of the sentence is no longer true after #38227

Which is to say the PR may need rebasing.

Overall though - aside from mostly minor comments in-line, it looks pretty good to me!


One thing that puzzles me is how can we (user) test this outside of the PR since mirrors are excluded? 😅 I suppose we'd need a provider published to the Registry which implements this to play with?

The only "non-production" provider which also happens to be published to the Registry that comes to mind is https://github.com/hashicorp/terraform-provider-tfcoremock - perhaps this is a good enough reason to copy over some logic over there from https://github.com/hashicorp/terraform-provider-pss ? That should still maintain some low-risk/low-exposure and allow us to test the whole workflow more thoroughly.

cmdFlags.Var(&init.BackendConfig, "backend-config", "")
cmdFlags.Var(&init.PluginPath, "plugin-dir", "plugin directory")
cmdFlags.BoolVar(&init.CreateDefaultWorkspace, "create-default-workspace", true, "when -input=false, use this flag to block creation of the default workspace")
cmdFlags.BoolVar(&init.SafeInitWithPluggableStateStore, "safe-init", false, `Enable the "safe init" workflow when downloading a provider binary for use with pluggable state storage.`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's been really difficult to come up with a good name for this so I'll just leave a few thoughts here:

  1. I'd generally try to avoid the word "safe" as some people might misinterpret it as some sort of ultimate "turbo button" of safety. It also doesn't really communicate what it does.
  2. This may be too verbose but the direction I think we should be going is e.g. download-state-store-plugin, state-store-plugin, state-store, install-state-store-plugin, allow-state-store-plugin or something along these lines.
  3. This should ideally "work well" (naming wise) alongside some other flags we expect it to be combined with. e.g. -upgrade and the future flag for automation that implies approval of the interactive prompt.

I do recognise though we'll need to be a bit more creative with the naming inside the implementation if we want to avoid "state store". I don't think the two names (user-facing vs internal) need to align necessarily.

Copy link
Member Author

@SarahFrench SarahFrench Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a decision that I've been putting off a bit 😅 Writing this PR using -safe-init was a bit of an exercise in Cunningham's Law but also just unblocking myself.

I like the distinction of "safe" versus "trust" in another review comment you left, so a name that includes that might be good?

  • -prompt-provider-approval
  • -prompt-provider-trust
  • -prompt-state-provider
  • -trust-state-provider-after-approval
  • -interactive-provider-approval

👉🏻 Another option could be to revisit the idea of users needing to opt into the security features at all when using Terraform in interactive mode; when we first got that feedback it was in the context of the 'Exit Early' UX being used in both interactive mode and automation. Maybe the prompt for input is sufficient friction for people using Terraform interactively, and a flag is only necessary in the context of automation?

Comment on lines +210 to +216
if init.Lockfile == "readonly" {
diags = diags.Append(tfdiags.Sourceless(
tfdiags.Error,
`The -safe-init and -lockfile=readonly options are mutually-exclusive`,
"The -safe-init flag is intended to help when first downloading or upgrading a provider to use for state storage, and in those scenarios the lockfile cannot be treated as read-only.",
))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not valid if the user wants to check for an upgrade of the PSS provider but not necessarily perform the upgrade?

It's true that first-time installation will always modify the lockfile but upgrade may not necessarily do so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the -upgrade and -lockfile=readonly flags are mutually exclusive (this dates from when the flag was first added to the original getProviders method), so prior to this PR it's not valid to check for any available upgrades that way.


Rather than just taking the above as unquestionable gospel I dug around into the original motivations for adding -lockfile=readonly to figure out what the intended use case was.

Brief summary of what I learned

In the original issue #27264 the author experiences a problem where terraform init is editing dependency lock files despite no changes to the providers in use (no new downloaded, no upgrades). The root cause is due to the different types of hashes that are used by Terraform (zh and h1) to handle providers supplied as zip files versus unpacked directories. This comment from Martin has loads of useful info.

If a filesystem mirror is made via providers mirror and then a matching dependency lock file is made using providers lock that lock file will contain only h1 hashes. There are no zh hashes as those are used for zip files, and the providers lock command creates nested directories in the 'unpacked' structure, not a zip file. However when you run a subsequent init command Terraform will update the dep lock file to include zh hashes using data from the Registry.

tl;dr: the new flag was a way to navigate around an issue that could have been solved by updating the Registry protocol, but understandably that wasn't an option in the past. The -lockfile=readonly flag is only intended to let people use lockfiles for filesystem mirrors that are sufficient but don't match what would be created by downloading from the Registry. It's not meant to enable something like a 'dry-run' provider download experience.

Copy link
Member Author

@SarahFrench SarahFrench Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With init -lockfile=readonly users would currently be able to 'test out' downloading a new provider because they'd hit this error after Terraform has already made tons of UI output describing which provider s were downloaded. Unfortunately I can see how users might have come to rely on that behaviour to get something resembling a 'dry-run' experience of downloading new providers, even though it wasn't intentional when the flag was implemented. But doing the same for upgrades is definitely not possible.

case SafeInitActionNotRelevant:
// do nothing; security features aren't relevant.
case SafeInitActionProceed:
// do nothing; provider is considered safe and there's no need for notifying the user.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// do nothing; provider is considered safe and there's no need for notifying the user.
// do nothing; provider is considered trusted and there's no need for notifying the user.

As mentioned - I'd rather avoid the word "safe" in this context. I think trust is more accurate as we cannot possibly guarantee any third party provider will be "safe" to execute.

We just facilitate a workflow for establishing trust between the user and the provider, nothing more nothing less.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed here, though I'm going to save changing this until the naming issue is solved, as I expect there will be lots of places where "safe" is used and should be replaced.

Comment on lines +209 to +215
// If the -safe-init flag isn't present we prompt the user to re-run init so they're opting into the security UX.
diags = diags.Append(&hcl.Diagnostic{
Severity: hcl.DiagError,
Summary: "State storage providers must be downloaded using -safe-init flag",
Detail: "The provider used for state storage needs to be installed safely. Please re-run the \"init\" command with the -safe-init flag.",
})
view.Diagnostics(diags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick (not blocking): Could/should we mention the human-friendly version of the provider address here?

I'm not sure if pointing to the state_store snippet is relevant - pointing to required_providers entry probably would be better but we don't really have that readily available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the human-friendly version of the provider addr in 5c553e9, and I liked the idea of providing relevant config in the error so made it available in 9e643d0.

…e with experiments and mutually-exclusive flags

In particular, note how -safe-init and -backend=false are not compatible.
…put is enabled

Terraform prompts users when installing state storage providers via HTTP. If a provider would be sourced via HTTP and the -safe-init flag is missing an error is returned. If a provider is downloaded via other sources then nothing has changed (no new output, no need for new flags).
…, update tests to assert hashes are in the prompt.
…n't provided, and passes when the flag is present.
@SarahFrench SarahFrench force-pushed the pss/safe-provider-download-during-init branch from bf0f17d to 388bcbd Compare March 9, 2026 12:28
…ed provider

Previously this asked users to refer to the dependency lock file, but this wouldn't have been updated during the ongoing init operation and would contain old information.
…be downloaded in.

For more discussion see this PR review thread: #38205 (comment)
…ons, add test showing provider being rejected
…ed in PSS. This is used during errors related to verifying the provider's installation
@SarahFrench
Copy link
Member Author

This PR implements security features by editing the getProvidersFromConfig function, that's only used in the experimental version of init in the PSS experiment.

Note that the second half of the sentence is no longer true after #38227

Which is to say the PR may need rebasing.

Overall though - aside from mostly minor comments in-line, it looks pretty good to me!

One thing that puzzles me is how can we (user) test this outside of the PR since mirrors are excluded? 😅 I suppose we'd need a provider published to the Registry which implements this to play with?

The only "non-production" provider which also happens to be published to the Registry that comes to mind is https://github.com/hashicorp/terraform-provider-tfcoremock - perhaps this is a good enough reason to copy over some logic over there from https://github.com/hashicorp/terraform-provider-pss ? That should still maintain some low-risk/low-exposure and allow us to test the whole workflow more thoroughly.

Thanks for the review!

I've rebased this PR's changes to include the un-forking of init 👍🏻

To user test this we could update the tfcoremock provider, yeah. Alternatively we can spin up a local registry using projects that implement the Registry protocol (e.g. https://github.com/MatthewJohn/terrareg) and serve whatever through that. The first option sounds easier to me, but interested to hear your opinion.

For the next round of review I'd like to draw attention to my final point in this message:

👉🏻 Another option could be to revisit the idea of users needing to opt into the security features at all when using Terraform in interactive mode; when we first got that feedback it was in the context of the 'Exit Early' UX being used in both interactive mode and automation. Maybe the prompt for input is sufficient friction for people using Terraform interactively, and a flag is only necessary in the context of automation?

I got some internal pushback on the UX to force users to add the flag (see my previous mention of Cunningham's Law, lol) , and I realised that when adding the flag was originally suggested to us by ProdSec it was in the context of a user needing to run a 'second init'. As that's now only relevant to when TF is run in automation we could consider removing its use in interactive mode?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-changelog-needed Add this to your PR if the change does not require a changelog entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants