Skip to content

Wrong logic in atlantis/apply state transition #5368

Open
@brontolinux

Description

@brontolinux

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

We are running Atlantis 0.31.0 in fargate. Our VCS is Gitlab (on premise). We use terragrunt with terraform, and in Atlantis we use a custom workflow to enable run-all. The Atlantis configuration is generated automatically via terragrunt-atlantis-config.

When we run atlantis apply via a comment in a PR, all apply pipelines are executed. Going all pipelines' states to "succeeded", the main pipeline, atlantis/apply will remain running.

The logs of the container will show something like this:

{"level":"info","ts":"2025-01-19T16:47:25.659Z","caller":"events/instrumented_project_command_runner.go:88","msg":"apply success. output available at: https://mygitserver.example.com/rikstv/sre/rikstv.terraform.infra.atlantistesting/-/merge_requests/15","json":{"repo":"rikstv/sre/rikstv.terraform.infra.atlantistesting","pull":"15"}}
{"level":"info","ts":"2025-01-19T16:47:27.254Z","caller":"vcs/gitlab_client.go:408","msg":"Updating GitLab commit status for 'atlantis/apply' to 'running'","json":{"repo":"rikstv/sre/rikstv.terraform.infra.atlantistesting","pull":"15"}}
{"level":"info","ts":"2025-01-19T16:47:27.343Z","caller":"vcs/gitlab_client.go:433","msg":"Pipeline found for commit 682ac035b55d8193a729b02edef6f8e71c8944ab, setting pipeline ID to 202381","json":{"repo":"rikstv/sre/rikstv.terraform.infra.atlantistesting","pull":"15"}}
{"level":"warn","ts":"2025-01-19T16:47:27.438Z","caller":"events/apply_command_runner.go:223","msg":"unable to update commit status: POST https://mygitserver.example.com/api/v4/projects/rikstv/sre/rikstv.terraform.infra.atlantistesting/statuses/682ac035b55d8193a729b02edef6f8e71c8944ab: 400 {message: Cannot transition status via :run from :running (Reason(s): Status cannot transition via \"run\")}","json":{"repo":"rikstv/sre/rikstv.terraform.infra.atlantistesting","pull":"15"},"stacktrace":"github.com/runatlantis/atlantis/server/events.(*ApplyCommandRunner).updateCommitStatus\n\tgithub.com/runatlantis/atlantis/server/events/apply_command_runner.go:223\ngithub.com/runatlantis/atlantis/server/events.(*ApplyCommandRunner).Run\n\tgithub.com/runatlantis/atlantis/server/events/apply_command_runner.go:181\ngithub.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand\n\tgithub.com/runatlantis/atlantis/server/events/command_runner.go:383"}

I did some digging in the Go code, tag release-0.31. The error happens in atlantis/server/events/apply_command_runner.go , lines 214-224, running applyCommandRunner.commitStatusUpdater.UpdateCombinedCount.

Unless I am mistaken, that will call the method UpdateCombinedCount from atlantis/server/events/commit_status_updater.go, lines 69-83. In there, we have this definition:

	src := fmt.Sprintf("%s/%s", d.StatusName, cmdName.String())

The string src is passed on to some CommitStatusUpdater's method Client.UpdateStatus, whose result is what will be sent back to UpdateCombinedCount.

There is a number of Clients defined, one for each VCS supported by Atlantis. Since we are using Gitlab, I went into atlantis/server/events/vcs/gitlab_client.go. In line 411 we have:

	logger.Info("Updating GitLab commit status for '%s' to '%s'", src, gitlabState)

Now, I was supposed to find a log line with severity info in the logs, and it's actually the second line in the log above, and that line puts all the pieces back together:

{"level":"info","ts":"2025-01-19T16:47:27.254Z","caller":"vcs/gitlab_client.go:408","msg":"Updating GitLab commit status for 'atlantis/apply' to 'running'","json":{"repo":"rikstv/sre/rikstv.terraform.infra.atlantistesting","pull":"15"}}

So, basically, Atlantis is trying to transition a pipeline, whose current status is running, to running again, which is wrong and is rejected by Gitlab. In fact, the first line in the log I posted reports a success, and Atlantis should transition the atlantis/apply pipeline to success instead, not running:

{"level":"info","ts":"2025-01-19T16:47:25.659Z","caller":"events/instrumented_project_command_runner.go:88","msg":"apply success. output available at: https://mygitserver.example.com/rikstv/sre/rikstv.terraform.infra.atlantistesting/-/merge_requests/15","json":{"repo":"rikstv/sre/rikstv.terraform.infra.atlantistesting","pull":"15"}}

So there is something in the Atlantis' logic that selects the destination state that is wrong (from running, it should go to success or failed, definitely not running).

I don't know where that logic is and what should be changed, you guys know better.

Reproduction Steps

  • as a simple test, have a terraform stack that just emits outputs, without creating any infrastructure
  • have a few terragrunt units, each one with its own state, that use the code mentioned above

In our case, the stack would be located at the path apps/sre/__selftest__, the terraform code would be in the _stack subdirectory, the units would be in environment-related directories, e.g. dev, prod..., each one with a terragrunt.hcl file that refers to the _stack directory. E.g.:

# terragrunt.hcl
include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "..//_stack"
}
# _stack/main.tf
output "foo" {
  value = "foo"
}

output "bar" {
  value = "bar"
}

You will find our custom workflow in the "Environment details" section. Keep in mind that the workflow is tightly bound to the repository's directory structure: if you use a different directory scheme than the one suggested here, you'll have to update the workflow, too.

Logs

{"level":"info","ts":"2025-01-19T16:47:25.659Z","caller":"events/instrumented_project_command_runner.go:88","msg":"apply success. output available at: https://mygitserver.example.com/rikstv/sre/rikstv.terraform.infra.atlantistesting/-/merge_requests/15","json":{"repo":"rikstv/sre/rikstv.terraform.infra.atlantistesting","pull":"15"}}
{"level":"info","ts":"2025-01-19T16:47:27.254Z","caller":"vcs/gitlab_client.go:408","msg":"Updating GitLab commit status for 'atlantis/apply' to 'running'","json":{"repo":"rikstv/sre/rikstv.terraform.infra.atlantistesting","pull":"15"}}
{"level":"info","ts":"2025-01-19T16:47:27.343Z","caller":"vcs/gitlab_client.go:433","msg":"Pipeline found for commit 682ac035b55d8193a729b02edef6f8e71c8944ab, setting pipeline ID to 202381","json":{"repo":"rikstv/sre/rikstv.terraform.infra.atlantistesting","pull":"15"}}
{"level":"warn","ts":"2025-01-19T16:47:27.438Z","caller":"events/apply_command_runner.go:223","msg":"unable to update commit status: POST https://mygitserver.example.com/api/v4/projects/rikstv/sre/rikstv.terraform.infra.atlantistesting/statuses/682ac035b55d8193a729b02edef6f8e71c8944ab: 400 {message: Cannot transition status via :run from :running (Reason(s): Status cannot transition via \"run\")}","json":{"repo":"rikstv/sre/rikstv.terraform.infra.atlantistesting","pull":"15"},"stacktrace":"github.com/runatlantis/atlantis/server/events.(*ApplyCommandRunner).updateCommitStatus\n\tgithub.com/runatlantis/atlantis/server/events/apply_command_runner.go:223\ngithub.com/runatlantis/atlantis/server/events.(*ApplyCommandRunner).Run\n\tgithub.com/runatlantis/atlantis/server/events/apply_command_runner.go:181\ngithub.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand\n\tgithub.com/runatlantis/atlantis/server/events/command_runner.go:383"}

Environment details

  • Atlantis version: 0.31.0
  • Deployment method: Atlantis on AWS Fargate Terraform Module; uses an EFS
    filesystem for persistent storage;
  • have you tried to reproduce this issue on the latest version: no, as it appears that there are bugs in 0.32 that would impact us
  • Atlantis flags: N/A

Atlantis server-side config file (snippet, as the file is autogenerated through terragrunt-atlantis-config and includes one project per stack in the repo):

automerge: false
parallel_apply: true
parallel_plan: false
projects:
  - autoplan:
      enabled: true
      when_modified:
        - '*.hcl'
        - '*.tf*'
        - '**/*.hcl'
        - '**/*.tf*'
        - ../../../terragrunt.hcl
    dir: apps/sre/__selftest__
    name: apps_sre___selftest__
    workspace: apps_sre___selftest__

Repo atlantis.yaml file:

repos:
  - id: git.rikstv.no/rikstv/sre/rikstv.terraform.infra
    workflow: terragrunt
    apply_requirements: [mergeable, approved]
    pre_workflow_hooks:
      - run: |
          terragrunt-atlantis-config generate --project-hcl-files context.hcl --output atlantis.yaml --autoplan --create-project-name --create-workspace --parallel="false"
          yq -i '.parallel_apply = true' atlantis.yaml
          yq eval 'del(.projects[] | select(.name | contains("atlantis")))' atlantis.yaml -i

    # This should be identical to the real one (rikstv.terraform.infra), but is used for testing atlantis
    # The webhook in the 'atlantistesting' repo is not configured to trigger atlantis in prod
  - id: git.rikstv.no/rikstv/sre/rikstv.terraform.infra.atlantistesting
    workflow: terragrunt
    apply_requirements: [mergeable, approved]
    pre_workflow_hooks:
      - run: |
          terragrunt-atlantis-config generate --project-hcl-files context.hcl --output atlantis.yaml --autoplan --create-project-name --create-workspace --parallel="false"
          yq -i '.parallel_apply = true' atlantis.yaml
          yq eval 'del(.projects[] | select(.name | contains("atlantis")))' atlantis.yaml -i

# Workflow adapted from:
# https://www.runatlantis.io/docs/custom-workflows.html#terragrunt
workflows:
  terragrunt:
    plan:
      steps:
        - env:
            # Reduce Terraform suggestion output
            name: TF_IN_AUTOMATION
            value: "true"
        - env:
            name: TERRAGRUNT_NON_INTERACTIVE
            value: "true"
        - run:
            # Allow for targetted plans/applies as not supported for Terraform wrappers by default
            command: terragrunt run-all plan -input=false $(printf '%s' $COMMENT_ARGS | sed 's/,/ /g' | tr -d '\\') -no-color -out ../../../../atlantis.tfplan
            output: hide
        - run: |
            terragrunt run-all show --terragrunt-parallelism=1 ../../../../atlantis.tfplan
    apply:
      steps:
        - env:
            # Reduce Terraform suggestion output
            name: TF_IN_AUTOMATION
            value: "true"
        - env:
            name: TERRAGRUNT_NON_INTERACTIVE
            value: "true"
        - run: terragrunt run-all apply -input=false ../../../../atlantis.tfplan
    import:
      steps:
        - env:
            name: TF_VAR_author
            command: 'git show -s --format="%ae" $HEAD_COMMIT'
        # Allow for imports as not supported for Terraform wrappers by default
        - run: terragrunt run-all import -input=false $(printf '%s' $COMMENT_ARGS | sed 's/,/ /' | tr -d '\\')
    state_rm:
      steps:
        - run: terragrunt run-all state rm $(printf '%s' $COMMENT_ARGS | sed 's/,/ /' | tr -d '\\')

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions