Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run terraform plan when setting --dry-run flag on experimental-deploy goal #20488

Merged
merged 16 commits into from
Feb 16, 2024

Conversation

agoblet
Copy link
Contributor

@agoblet agoblet commented Feb 5, 2024

Changed the following things to achieve this

  • Add a --dry-run flag to the experimental-deploy goal to handle dry runs such as terraform plan
  • Run plan rather than apply when setting the --dry-run flag while deploying a terraform_deployment
  • Changed Helm deployments to use the new --dry-run flag instead of a passthrough arg for dry-running

Tested

  • the terraform change via a unit test and also tried it out manually in my own project.
  • the helm change via a unit test.

Closes #18490

@agoblet agoblet marked this pull request as ready for review February 5, 2024 16:45
@tgolsson
Copy link
Contributor

tgolsson commented Feb 6, 2024

Hey,

Thanks for the PR! I'm not too familiar with the terraform backend, so I'll hold off reviewing it for another maintainer/contributor who's more well-versed in that.

With that said, I'd like to leave a few thoughts as someone "in the vincinity" -- I build a k8s plugin. There's a whole set of verbs that make sense in the infrastructure domain that don't necessarily map well to Pants actions. This was raised in the original topic as well. In a fully pants-managed world, what does the user-story look like there? --dry-run is benign because that's likely fairly mappable, but even there a k8s user might ask "on server or client"?

Going further, what about terraform show? terraform destroy? Are those "actions" on deploy? Other goals? It doesn't matter if we get it wrong at this stage since it's an experiental goal -- but I read your comment about how helm works, and figured I'd add more thoughts. I'm not sure if experimental-deploy can run on a glob/multiple targets, but what if you have heterogenous targets? That probably means we have to do flags, not passthrough args.

And taking that one step further... deploy --dry-run is sensible. deploy --action=destroy? deploy --action=scale, but what if terraform, etc.

What I do right now without adopting experimental-deploy yet, is target generation. So if I have a k8s_object(name="k8s"), I generate k8s#get, k8s#apply, and so on. These are all run targets however, so single invocation. To work around that I do a k8s_objects wrapper that handles multiplexing. Terrible hacks all the way down.

@agoblet
Copy link
Contributor Author

agoblet commented Feb 6, 2024

Good points @tgolsson! The infra and Pants domains do not map really well indeed. I went with the --dry-run idea because that was proposed in the ticket. There's various edge cases popping up now though now I am working on it and read your comments.

What to do with other operations such as destroy is still an open question I guess. For the way we use terraform, plan and apply are the only operations we really use in an automated way. Destroy is more of a one-off thing that's only necessary every now and then. Could be supported eventually but might not be necessary yet in this experimental stage.

Let's see what other maintainers have to say about it.

@agoblet
Copy link
Contributor Author

agoblet commented Feb 6, 2024

I'm not sure if experimental-deploy can run on a glob/multiple targets, but what if you have heterogenous targets?

you can run it for multiple targets I think, pants experimental-deploy :: works. In that case, adding —helm-args might not work indeed when terraform targets are deployed as well. Not sure though, as I’m still a Pants noob 😁

@agoblet
Copy link
Contributor Author

agoblet commented Feb 6, 2024

What I do right now without adopting experimental-deploy yet, is target generation. So if I have a k8s_object(name="k8s"), I generate k8s#get, k8s#apply, and so on. These are all run targets however, so single invocation. To work around that I do a k8s_objects wrapper that handles multiplexing. Terrible hacks all the way down.

I have to wrap my head around what that means. We are not yet using Pants. I am investigating how our codebase with lots of terraform, lots of python, and some docker can be managed using Pants. terraform plan is an important part of that puzzle and I thought I could easily implement this —dry-run feature to support that. But maybe using run targets works as well without requiring new features in Pants

@lilatomic
Copy link
Contributor

Thanks for the contribution! I had a quick look and it looks good. I'll be able to do a full review tomorrow.
I agree with both of you that there are currently 2 "big picture" questions:

  1. what to do with backends that don't support dry-run
  2. what to do with all the extra "infra" verbs

I think it makes sense to add --dry-run at the goal level. It's common enough in infra that I'd want it standardised. I think we could add a field on DeployFieldSet which language-specific backends could set for whether they support --dry-run. We could then skip those in the run_deploy goal rule.

As for all the extra verbs, I think that at some point it's reasonable to eject from Pants with export and just use the tool. I think as a starting point it's good to never lose functionality by adopting Pants. But also, often these advanced actions don't really map to anything in the Pants domain or across tools, so we don't really add anything by adding them to Pants. For illustration, run just accepts an argument list, since Pants doesn't really have anything to add to that; even the run goal for Docker just has --docker-run-args.


I'm not sure how much discussion has to happen before we could merge this. That is, I'm hoping that we don't have to hash out exactly what's common enough to standardise and what should be left to backends, as I think --dry-run is uncontroversial for that. Implementing a per-backend opt-in/out with DeployFieldSet.supports_dry_run should be simple enough (could even be a fast-follow MR).

@agoblet
Copy link
Contributor Author

agoblet commented Feb 6, 2024

Thanks for the feedback @lilatomic !

I think it makes sense to add --dry-run at the goal level.

Yes I think so too. In that case, I’ll extend the PR tomorrow to make sure Helm also uses this flag rather than the passthrough arguments.

As for all the extra verbs, I think that at some point it's reasonable to eject from Pants with export and just use the tool.

This will be something for another PR / issue then.

I'm hoping that we don't have to hash out exactly what's common enough to standardise and what should be left to backends

Not sure how the decision making works here. For me the current idea of the dry-run flag is sufficient for our terraform needs right now.

Implementing a per-backend opt-in/out with DeployFieldSet.supports_dry_run should be simple enough (could even be a fast-follow MR).

From a quick search I could only find helm and terraform as targets for deploy, but please correct me if I’m wrong. If I’m right, then we can omit this field for now, since both helm and terraform support dry running.

@agoblet
Copy link
Contributor Author

agoblet commented Feb 7, 2024

@lilatomic based on our discussion yesterday I changed helm to use the new goal-level flag as well.

Copy link
Contributor

@lilatomic lilatomic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the MR! It looks good to me. I appreciate the thorough tests.
I'll see about getting the CI running and a review from a Maintainer.

@@ -165,7 +164,7 @@ class HelmSubsystem(TemplatedExternalTool):
)

args = ArgsListOption(
example="--dry-run",
example="--force",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also mention the correct way to pass --dry-run, something like:

use the --experimental-deploy-dry-run flag to pass --dry-run

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I added it in another place, but I think that was an error message. Added it to this description as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added information on dry-running into the helm and terraform docs.

@agoblet
Copy link
Contributor Author

agoblet commented Feb 13, 2024

ping @alonsodomin @lilatomic

@alonsodomin
Copy link
Contributor

alonsodomin commented Feb 13, 2024

Sorry for the delay chiming on here. The addition of the --dry-run flag to deploy-like goal makes sense to me, as @lilatomic said, it's common enough in the deployment domains and it translates well into the different tools (helm has a flag, terraform can use the plan command and ansible has a check mode that does the same).

The implementation here looks good to me but leaves me wondering if it's enough. Meaning that when we normally want to run in a dry run mode we expect to get some output from the tool apart from just an OK or KO return code. As is right now I believe that output will be omitted (current implementation of the goal only prints if the action succeeded or not).

I think we should make the experimental-deploy goal print to stdout the output we receive from the given tool, at least in the case in which the execution has failed. A similar implementation to what the test goal does is what I'm thinking right now.


On the wider topic of the existence of the experimental-deploy goal, I agree that there are many kind of operations common in the infrastructure domain that do not map well to Pants. We initially added this goal though as we thought that there are some benefits here in which Pants can help a lot by leveraging the rule graph, i.e.: we can trace all the artifacts that depend on a given deployment and publish them just to make sure that they are available to the infrastructure when we run the deployment. I also see the action of publishing an artifact into a repository as an infrastructure-related action is so ubiquitous on build tools that it raises less eyebrows.

At the moment it's true that experimental-deploy can not handle multiple targets. I did start some work in the past to try to amend that in #19260 and the result is that it is possible although the implementation is really ugly. The main idea would be to use Pants' engine to traverse the dependency graph and perform the right action at each node (similar way as how dependencies are resolved). The limitation here is that, since deploying and publishing are side-effecting actions (and therefore an InteractiveProcess) we can only run them at a @goal_rule and not at a normal @rule, which leaves you with either having to build and traverse the dependency graph yourself or to hack it unwrapping the Process inside the InteractiveProcess and run it as it was not side-effecting (or both!). Either one very ugly options.

That PR is currently parked until I find the time to come up with a better idea (or we totally give up on experimental-deploy). If anyone has any suggestions I'd be happy to receive some input.

PS: Sorry for the wall of text.

@agoblet
Copy link
Contributor Author

agoblet commented Feb 13, 2024

Thanks for the review @alonsodomin . If I understand you correctly the only thing we have to do for this PR is the printing of the output. I'm not sure what exactly is missing here, because for terraform_deployment I can already see the output. I did not test this for Helm.

Can you elaborate what's still needed? Do we perhaps need to disable something on the Terraform backend and move that to the Deploy goal?

❯ PANTS_SOURCE=../pants pants experimental-deploy --dry-run ::
Pantsd has been turned off via Env.
15:41:39.32 [INFO] Deploying targets...
module.lambda_function.data.aws_region.current: Reading...
module.lambda_function.data.aws_caller_identity.current: Reading...
module.lambda_function.data.aws_region.current: Read complete after 0s [id=eu-west-1]
module.lambda_function.data.aws_iam_policy_document.assume_role[0]: Reading...
module.some_cool_module.data.aws_iam_policy_document.test_assume_role: Reading...
module.lambda_function.data.aws_partition.current: Reading...
module.lambda_function.data.aws_iam_policy_document.assume_role[0]: Read complete after 0s [id=2690255455]
module.some_cool_module.data.aws_iam_policy_document.test_assume_role: Read complete after 0s [id=1229436035]
module.lambda_function.data.aws_partition.current: Read complete after 0s [id=aws]
module.lambda_function.data.aws_caller_identity.current: Read complete after 0s [id=740837425924]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
 <= read (data resources)

Terraform will perform the following actions:

  # module.lambda_function.data.aws_iam_policy_document.logs[0] will be read during apply
  # (config refers to values not yet known)
 <= data "aws_iam_policy_document" "logs" {
      + id   = (known after apply)
      + json = (known after apply)

      + statement {
          + actions   = [
              + "logs:CreateLogGroup",
              + "logs:CreateLogStream",
              + "logs:PutLogEvents",
            ]
          + effect    = "Allow"
          + resources = (known after apply)
        }
    }

  # module.lambda_function.aws_cloudwatch_log_group.lambda[0] will be created
  + resource "aws_cloudwatch_log_group" "lambda" {
      + arn               = (known after apply)
      + id                = (known after apply)
      + log_group_class   = (known after apply)
      + name              = "/aws/lambda/my-lambda1"
      + name_prefix       = (known after apply)
      + retention_in_days = 0
      + skip_destroy      = false
      + tags_all          = (known after apply)
    }

  # module.lambda_function.aws_iam_policy.logs[0] will be created
  + resource "aws_iam_policy" "logs" {
      + arn         = (known after apply)
      + id          = (known after apply)
      + name        = "my-lambda1-logs"
      + name_prefix = (known after apply)
      + path        = "/"
      + policy      = (known after apply)
      + policy_id   = (known after apply)
      + tags_all    = (known after apply)
    }

  # module.lambda_function.aws_iam_role.lambda[0] will be created
  + resource "aws_iam_role" "lambda" {
      + arn                   = (known after apply)
      + assume_role_policy    = jsonencode(
            {
              + Statement = [
                  + {
                      + Action    = "sts:AssumeRole"
                      + Effect    = "Allow"
                      + Principal = {
                          + Service = "lambda.amazonaws.com"
                        }
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + create_date           = (known after apply)
      + force_detach_policies = true
      + id                    = (known after apply)
      + managed_policy_arns   = (known after apply)
      + max_session_duration  = 3600
      + name                  = "my-lambda1"
      + name_prefix           = (known after apply)
      + path                  = "/"
      + tags_all              = (known after apply)
      + unique_id             = (known after apply)
    }

  # module.lambda_function.aws_iam_role_policy_attachment.logs[0] will be created
  + resource "aws_iam_role_policy_attachment" "logs" {
      + id         = (known after apply)
      + policy_arn = (known after apply)
      + role       = "my-lambda1"
    }

  # module.lambda_function.aws_lambda_function.this[0] will be created
  + resource "aws_lambda_function" "this" {
      + architectures                  = (known after apply)
      + arn                            = (known after apply)
      + description                    = "My awesome lambda function"
      + filename                       = "../../../dist/src.python.some_lambda/lambda.zip"
      + function_name                  = "my-lambda1"
      + handler                        = "lambda_function.handler"
      + id                             = (known after apply)
      + invoke_arn                     = (known after apply)
      + last_modified                  = (known after apply)
      + memory_size                    = 128
      + package_type                   = "Zip"
      + publish                        = false
      + qualified_arn                  = (known after apply)
      + qualified_invoke_arn           = (known after apply)
      + reserved_concurrent_executions = -1
      + role                           = (known after apply)
      + runtime                        = "python3.9"
      + signing_job_arn                = (known after apply)
      + signing_profile_version_arn    = (known after apply)
      + skip_destroy                   = false
      + source_code_hash               = (known after apply)
      + source_code_size               = (known after apply)
      + tags_all                       = (known after apply)
      + timeout                        = 3
      + version                        = (known after apply)

      + ephemeral_storage {
          + size = 512
        }

      + logging_config {
          + log_format = "Text"
          + log_group  = (known after apply)
        }
    }

  # module.some_cool_module.aws_iam_role.test will be created
  + resource "aws_iam_role" "test" {
      + arn                   = (known after apply)
      + assume_role_policy    = jsonencode(
            {
              + Statement = [
                  + {
                      + Action    = "sts:AssumeRole"
                      + Effect    = "Allow"
                      + Principal = {
                          + Service = "codebuild.amazonaws.com"
                        }
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + create_date           = (known after apply)
      + force_detach_policies = false
      + id                    = (known after apply)
      + managed_policy_arns   = (known after apply)
      + max_session_duration  = 3600
      + name                  = "test-pants-role"
      + name_prefix           = (known after apply)
      + path                  = "/"
      + tags_all              = (known after apply)
      + unique_id             = (known after apply)
    }

Plan: 6 to add, 0 to change, 0 to destroy.

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these actions if you run "terraform apply" now.

✓ src/terraform/some_cool_root_module:some_cool_root_module_deployment deployed

@alonsodomin
Copy link
Contributor

agh, my bad, was checking the implementation of the deploy goal in core and assumed the output was not being emitted. It's ok as it is.

@agoblet
Copy link
Contributor Author

agoblet commented Feb 16, 2024

@alonsodomin ping. The CI passes. Do you need me to merge main into this? can you get PR merged?

@alonsodomin
Copy link
Contributor

just approved and merged, many thanks!

@alonsodomin alonsodomin merged commit 7c1e0b7 into pantsbuild:main Feb 16, 2024
24 checks passed
@agoblet agoblet deleted the feature/18490-terraform-plan branch February 16, 2024 11:56
@lilatomic
Copy link
Contributor

Thanks again for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Run terraform plan on package
5 participants