PSS: safe provider download during init (interactive mode only)#38205
PSS: safe provider download during init (interactive mode only)#38205SarahFrench wants to merge 13 commits intomainfrom
Conversation
|
TODO:
|
560e307 to
b673d51
Compare
|
Rebased this branch onto the branch in #38206 |
b673d51 to
2fcd5c4
Compare
24292a8 to
f9e56cd
Compare
|
Rebased onto main after the 2 PRs dependencies were merged there |
internal/command/init_test.go
Outdated
| } | ||
| }) | ||
|
|
||
| t.Run("upgrading the provider used for state storage via HTTP", func(t *testing.T) { |
There was a problem hiding this comment.
Note for reviewer: this shows that the new flag is needed for upgrade but, as discussed internally, there is still the bug where the same provider is used to create the source and destination state stores during state migration.
radeksimko
left a comment
There was a problem hiding this comment.
This PR implements security features by editing the getProvidersFromConfig function, that's only used in the experimental version of init in the PSS experiment.
Note that the second half of the sentence is no longer true after #38227
Which is to say the PR may need rebasing.
Overall though - aside from mostly minor comments in-line, it looks pretty good to me!
One thing that puzzles me is how can we (user) test this outside of the PR since mirrors are excluded? 😅 I suppose we'd need a provider published to the Registry which implements this to play with?
The only "non-production" provider which also happens to be published to the Registry that comes to mind is https://github.com/hashicorp/terraform-provider-tfcoremock - perhaps this is a good enough reason to copy over some logic over there from https://github.com/hashicorp/terraform-provider-pss ? That should still maintain some low-risk/low-exposure and allow us to test the whole workflow more thoroughly.
| cmdFlags.Var(&init.BackendConfig, "backend-config", "") | ||
| cmdFlags.Var(&init.PluginPath, "plugin-dir", "plugin directory") | ||
| cmdFlags.BoolVar(&init.CreateDefaultWorkspace, "create-default-workspace", true, "when -input=false, use this flag to block creation of the default workspace") | ||
| cmdFlags.BoolVar(&init.SafeInitWithPluggableStateStore, "safe-init", false, `Enable the "safe init" workflow when downloading a provider binary for use with pluggable state storage.`) |
There was a problem hiding this comment.
I know it's been really difficult to come up with a good name for this so I'll just leave a few thoughts here:
- I'd generally try to avoid the word "safe" as some people might misinterpret it as some sort of ultimate "turbo button" of safety. It also doesn't really communicate what it does.
- This may be too verbose but the direction I think we should be going is e.g.
download-state-store-plugin,state-store-plugin,state-store,install-state-store-plugin,allow-state-store-pluginor something along these lines. - This should ideally "work well" (naming wise) alongside some other flags we expect it to be combined with. e.g.
-upgradeand the future flag for automation that implies approval of the interactive prompt.
I do recognise though we'll need to be a bit more creative with the naming inside the implementation if we want to avoid "state store". I don't think the two names (user-facing vs internal) need to align necessarily.
There was a problem hiding this comment.
Yeah this is a decision that I've been putting off a bit 😅 Writing this PR using -safe-init was a bit of an exercise in Cunningham's Law but also just unblocking myself.
I like the distinction of "safe" versus "trust" in another review comment you left, so a name that includes that might be good?
-prompt-provider-approval-prompt-provider-trust-prompt-state-provider-trust-state-provider-after-approval-interactive-provider-approval
👉🏻 Another option could be to revisit the idea of users needing to opt into the security features at all when using Terraform in interactive mode; when we first got that feedback it was in the context of the 'Exit Early' UX being used in both interactive mode and automation. Maybe the prompt for input is sufficient friction for people using Terraform interactively, and a flag is only necessary in the context of automation?
| if init.Lockfile == "readonly" { | ||
| diags = diags.Append(tfdiags.Sourceless( | ||
| tfdiags.Error, | ||
| `The -safe-init and -lockfile=readonly options are mutually-exclusive`, | ||
| "The -safe-init flag is intended to help when first downloading or upgrading a provider to use for state storage, and in those scenarios the lockfile cannot be treated as read-only.", | ||
| )) | ||
| } |
There was a problem hiding this comment.
Is it not valid if the user wants to check for an upgrade of the PSS provider but not necessarily perform the upgrade?
It's true that first-time installation will always modify the lockfile but upgrade may not necessarily do so.
There was a problem hiding this comment.
Currently the -upgrade and -lockfile=readonly flags are mutually exclusive (this dates from when the flag was first added to the original getProviders method), so prior to this PR it's not valid to check for any available upgrades that way.
Rather than just taking the above as unquestionable gospel I dug around into the original motivations for adding -lockfile=readonly to figure out what the intended use case was.
Brief summary of what I learned
In the original issue #27264 the author experiences a problem where terraform init is editing dependency lock files despite no changes to the providers in use (no new downloaded, no upgrades). The root cause is due to the different types of hashes that are used by Terraform (zh and h1) to handle providers supplied as zip files versus unpacked directories. This comment from Martin has loads of useful info.
If a filesystem mirror is made via providers mirror and then a matching dependency lock file is made using providers lock that lock file will contain only h1 hashes. There are no zh hashes as those are used for zip files, and the providers lock command creates nested directories in the 'unpacked' structure, not a zip file. However when you run a subsequent init command Terraform will update the dep lock file to include zh hashes using data from the Registry.
tl;dr: the new flag was a way to navigate around an issue that could have been solved by updating the Registry protocol, but understandably that wasn't an option in the past. The -lockfile=readonly flag is only intended to let people use lockfiles for filesystem mirrors that are sufficient but don't match what would be created by downloading from the Registry. It's not meant to enable something like a 'dry-run' provider download experience.
There was a problem hiding this comment.
With init -lockfile=readonly users would currently be able to 'test out' downloading a new provider because they'd hit this error after Terraform has already made tons of UI output describing which provider s were downloaded. Unfortunately I can see how users might have come to rely on that behaviour to get something resembling a 'dry-run' experience of downloading new providers, even though it wasn't intentional when the flag was implemented. But doing the same for upgrades is definitely not possible.
| case SafeInitActionNotRelevant: | ||
| // do nothing; security features aren't relevant. | ||
| case SafeInitActionProceed: | ||
| // do nothing; provider is considered safe and there's no need for notifying the user. |
There was a problem hiding this comment.
| // do nothing; provider is considered safe and there's no need for notifying the user. | |
| // do nothing; provider is considered trusted and there's no need for notifying the user. |
As mentioned - I'd rather avoid the word "safe" in this context. I think trust is more accurate as we cannot possibly guarantee any third party provider will be "safe" to execute.
We just facilitate a workflow for establishing trust between the user and the provider, nothing more nothing less.
There was a problem hiding this comment.
Agreed here, though I'm going to save changing this until the naming issue is solved, as I expect there will be lots of places where "safe" is used and should be replaced.
| // If the -safe-init flag isn't present we prompt the user to re-run init so they're opting into the security UX. | ||
| diags = diags.Append(&hcl.Diagnostic{ | ||
| Severity: hcl.DiagError, | ||
| Summary: "State storage providers must be downloaded using -safe-init flag", | ||
| Detail: "The provider used for state storage needs to be installed safely. Please re-run the \"init\" command with the -safe-init flag.", | ||
| }) | ||
| view.Diagnostics(diags) |
There was a problem hiding this comment.
Nitpick (not blocking): Could/should we mention the human-friendly version of the provider address here?
I'm not sure if pointing to the state_store snippet is relevant - pointing to required_providers entry probably would be better but we don't really have that readily available.
…e with experiments and mutually-exclusive flags In particular, note how -safe-init and -backend=false are not compatible.
… in an uninitialised directory
…put is enabled Terraform prompts users when installing state storage providers via HTTP. If a provider would be sourced via HTTP and the -safe-init flag is missing an error is returned. If a provider is downloaded via other sources then nothing has changed (no new output, no need for new flags).
…, update tests to assert hashes are in the prompt.
…e new security features
…n't provided, and passes when the flag is present.
bf0f17d to
388bcbd
Compare
…ith, to help when interpreting hashes
…ed provider Previously this asked users to refer to the dependency lock file, but this wouldn't have been updated during the ongoing init operation and would contain old information.
…be downloaded in. For more discussion see this PR review thread: #38205 (comment)
…ons, add test showing provider being rejected
…facing errors, and in test names
…downloading a state store provider
…ed in PSS. This is used during errors related to verifying the provider's installation
Thanks for the review! I've rebased this PR's changes to include the un-forking of init 👍🏻 To user test this we could update the tfcoremock provider, yeah. Alternatively we can spin up a local registry using projects that implement the Registry protocol (e.g. https://github.com/MatthewJohn/terrareg) and serve whatever through that. The first option sounds easier to me, but interested to hear your opinion. For the next round of review I'd like to draw attention to my final point in this message:
I got some internal pushback on the UX to force users to add the flag (see my previous mention of Cunningham's Law, lol) , and I realised that when adding the flag was originally suggested to us by ProdSec it was in the context of a user needing to run a 'second init'. As that's now only relevant to when TF is run in automation we could consider removing its use in interactive mode? |
Context & what the new security feature does
This PR fixes a security issue identified in the PSS project. Previously users had the opportunity to inspect providers downloaded in
initbefore the providers binaries were executed and used during other downstream commands likeplan/apply. With the introduction of PSS we created a scenario where a provider can be downloaded and immediately used without users having an opportunity to vet them.Now the
initcommand will intervene when it sees Terraform is going to download an 'unsafe' provider for pluggable state storage.What is unsafe? Terraform regards any provider downloaded over HTTP to be unsafe until the user has approved it. Providers that are sourced from the local machine are treated as safe and pre-vetted, as the user has curated that collection of providers themselves and explicitly directed Terraform to use them via CLI configuration (see use of filesystem_mirror in the docs).
How does Terraform intervene? If Terraform detects that the state storage provider will be downloaded via HTTP and the user hasn't opted into a 'safe init' process then the CLI will return an error prompting the user to re-run the init command with a flag (
-safe-init) that opts them into the new security features. The intention of this flag it to ensure users navigate the potential security issue in an engaged way.When would Terraform intervene? If the provider is downloaded via HTTP then Terraform will intervene when the state storage provider is first installed or upgraded (via
-upgradeflag).When would Terraform not intervene? If the state storage provider is being downloaded for the first time or upgraded but the provider is sourced from a non-HTTP source then Terraform will not force the user to supply the
-safe-initflag.This PR
This work was created after doing some prototyping of different approaches in:
The approach in this PR is based on the last branch.
This PR implements security features by editing the
getProvidersFromConfigfunction, that's only used in the experimental version ofinitin the PSS experiment. This function determines what providers to download (based on preexisting locks, config, and upgrade flag value) and returns new 'config locks' to the init command implementation. The function now also determines what action Terraform should take regarding PSS and the new security features, and returns a value to the calling code instructing it what to do. By implementing this logic inside thegetProvidersFromConfigfunction we can access the location metadata for providers that are downloaded within a giveninitcommand and this location data helps to decide if the user needs to be prompted or not.This PR only implements the features for when Terraform is run in an interactive mode. A follow-up PR will handle security features for TF in automation.
Related changes
This PR was stacked on top of:
#38206
#38212
Target Release
N/A
Rollback Plan
Changes to Security Controls
Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.
CHANGELOG entry