Skip to content

feat: Plan store object storage#6312

Open
daanvinken wants to merge 9 commits intorunatlantis:mainfrom
daanvinken:plan-store-object-storage
Open

feat: Plan store object storage#6312
daanvinken wants to merge 9 commits intorunatlantis:mainfrom
daanvinken:plan-store-object-storage

Conversation

@daanvinken
Copy link

what

  • Fixes Proposal: Write plans to S3 #265
  • Add S3-compatible external plan store so Terraform plan files survive pod restarts (plan on pod A, pod dies, apply on pod B)
  • Stale plan detection via S3 object metadata: rejects apply if PR head commit changed since plan
  • PR close cleanup deletes all plan objects from S3 via DeleteForPull
  • Fail-fast startup validation via HeadBucket
  • Existing LocalPlanStore behavior unchanged
  • CLI flags: --plan-store, --plan-store-s3-bucket, --plan-store-s3-region, --plan-store-s3-prefix, --plan-store-s3-endpoint, --plan-store-s3-force-path-style, --plan-store-s3-profile

why

  • Atlantis plan files live on emptyDir and don't survive pod restarts
  • Without this, a pod restart between plan and apply requires re-planning
  • With multiple replicas, the pod that receives the apply webhook may not be the one that ran plan

tests

  • plan + pod restart + apply succeeds (testing cluster, S3-compatible object storage)
  • Stale plan rejected after new commit pushed
  • PR close removes plan objects from S3
  • LocalPlanStore behavior unchanged

Plan + pod restart + apply:

{"level":"info","ts":"...","caller":"server/server.go:666","msg":"initializing S3 plan store (bucket=<bucket>, region=us-east-1)","json":{}}
{"level":"info","ts":"...","caller":"runtime/s3_plan_store.go:124","msg":"uploaded plan to s3://<bucket>/<org>/<repo>/17/default/<dir>/<planfile>","json":{}}
{"level":"info","ts":"...","caller":"events/instrumented_project_command_runner.go:91","msg":"plan success. output available at: https://<github>/<org>/<repo>/pull/17","json":{"repo":"<org>/<repo>","pull":"17"}}

Pod restarted, apply triggered:

{"level":"info","ts":"...","caller":"events/project_command_builder.go:815","msg":"pull directory missing, re-cloning repo for apply","json":{"repo":"<org>/<repo>","pull":"17"}}
{"level":"info","ts":"...","caller":"runtime/s3_plan_store.go:247","msg":"restored plan from s3://<bucket>/<org>/<repo>/17/default/<dir>/<planfile> to /atlantis-data/repos/<org>/<repo>/17/default/<dir>/<planfile>","json":{}}
{"level":"info","ts":"...","caller":"runtime/s3_plan_store.go:256","msg":"restored 1 plan(s) from S3 for <org>/<repo>#17","json":{}}
{"level":"info","ts":"...","caller":"runtime/apply_step_runner.go:75","msg":"apply successful, deleting planfile","json":{"repo":"<org>/<repo>","pull":"17"}}

Stale plan rejection:

{"level":"error","ts":"...","caller":"events/instrumented_project_command_runner.go:81","msg":"Error running apply operation: loading plan: plan in S3 has no head-commit metadata (key=<org>/<repo>/17/default/<dir>/<planfile>) — run plan again\n","json":{"repo":"<org>/<repo>","pull":"17"}}

PR close cleanup:

{"level":"info","ts":"...","caller":"events/events_controller.go:605","msg":"Pull request closed, cleaning up...","json":{"repo":"<org>/<repo>","pull":"17"}}
{"level":"info","ts":"...","caller":"runtime/s3_plan_store.go:299","msg":"deleted 1 plan(s) from S3 for <org>/<repo>#17","json":{}}

references

Copilot AI review requested due to automatic review settings March 13, 2026 15:41
@github-actions github-actions bot added dependencies PRs that update a dependency file go Pull requests that update Go code labels Mar 13, 2026
return fmt.Errorf("plan in S3 has no head-commit metadata (key=%s) — run plan again", key)
}
if ctx.Pull.HeadCommit != "" && planCommit != ctx.Pull.HeadCommit {
return fmt.Errorf("plan was created at commit %.8s but PR is now at %.8s — run plan again", planCommit, ctx.Pull.HeadCommit)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3PlanStore plans survive restarts by design. Without a check, a plan from commit A gets applied after commit B is pushed. Save stores ctx.Pull.HeadCommit as S3 object metadata; Load rejects if it differs from the current PR head.

LocalPlanStore's Save/Load are no-ops (Terraform does direct disk I/O), so there's no interception point for metadata. With emptyDir this is fine; pod restart wipes plans, forcing re-plan. With a PersistentVolume the same stale plan risk exists, but that's a pre-existing upstream Atlantis gap we didn't introduce.

I'm also ok leaving this out, as it should perhaps be fixed on localPlanStore as well?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@dosubot dosubot bot added the feature New functionality/enhancement label Mar 13, 2026
@daanvinken daanvinken requested a review from Copilot March 13, 2026 16:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 7 comments.

Copy link
Contributor

@adkafka adkafka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 7 comments.

@daanvinken daanvinken force-pushed the plan-store-object-storage branch from a7a123d to 9c0b6ed Compare March 16, 2026 10:43
@daanvinken daanvinken changed the title Plan store object storage feat: Plan store object storage Mar 16, 2026
@daanvinken
Copy link
Author

@lukemassa @pseudomorph @GenPage @chenrui333 @nitrocode are you available to have a look here and perhaps at #6295 ? 🙏

@daanvinken daanvinken force-pushed the plan-store-object-storage branch from 9c0b6ed to 47de395 Compare March 16, 2026 12:10
@github-actions github-actions bot added docs Documentation github-actions labels Mar 16, 2026
daanvinken and others added 7 commits March 16, 2026 13:11
Introduces PlanStore abstraction for plan file persistence. LocalPlanStore
wraps current filesystem behavior with no functional change. Phase 2 will
add S3PlanStore as a drop-in replacement to remove the PV dependency.

Signed-off-by: Daan Vinken <daanvinken@tythus.com>
Signed-off-by: Daan Vinken <dvinken@tesla.com>
Plan files are uploaded to S3 after plan and downloaded before apply,
allowing pods to restart without losing plans. On apply, if the working
directory is missing (e.g. emptyDir wiped), the repo is re-cloned and
plans are restored from S3 via prefix scan.

LocalPlanStore behavior is unchanged — the re-clone and restore logic
only activates when an external plan store is configured.

Signed-off-by: Daan Vinken <daanvinken@tythus.com>
Signed-off-by: Daan Vinken <dvinken@tesla.com>
Call HeadBucket during NewS3PlanStore to fail fast on misconfigured
bucket or credentials instead of silently failing on first plan.

Signed-off-by: Daan Vinken <daanvinken@tythus.com>
Signed-off-by: Daan Vinken <dvinken@tesla.com>
Add DeleteForPull to PlanStore interface and implement in S3PlanStore
using ListObjectsV2 + DeleteObject per key. Hook into
PullClosedExecutor.CleanUpPull() to prevent plan accumulation in S3.
Failures are logged as warnings to avoid blocking local cleanup.

Signed-off-by: Daan Vinken <daanvinken@tythus.com>
Signed-off-by: Daan Vinken <dvinken@tesla.com>
Store the PR head commit SHA as S3 object metadata on Save. On Load,
reject the plan if the stored commit differs from the current PR head.
Plans without metadata are also rejected — forces re-plan after upgrade.

Signed-off-by: Daan Vinken <daanvinken@tythus.com>
Signed-off-by: Daan Vinken <dvinken@tesla.com>
Signed-off-by: Daan Vinken <dvinken@tesla.com>
Signed-off-by: Daan Vinken <dvinken@tesla.com>
@daanvinken daanvinken force-pushed the plan-store-object-storage branch from 47de395 to 8562509 Compare March 16, 2026 12:15
@pseudomorph
Copy link
Contributor

I'll try to give this a look in the next day or so.

}
localPath := filepath.Join(pullDir, relPath)

if err := os.MkdirAll(filepath.Dir(localPath), 0o700); err != nil {

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
return fmt.Errorf("downloading plan from S3 (key=%s): %w", key, err)
}

f, err := os.Create(localPath)

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
@jamengual
Copy link
Contributor

I see all the new flags and I wonder if we should move to the server side config instead and have only one flag like --enable-s3-store or something like that

@jamengual
Copy link
Contributor

also there is a lot of mention of Pods and K8s but I will like to aim this at containers in general ( just a language change), so people do not think this is just only for k8s workloads.

@pseudomorph
Copy link
Contributor

I see all the new flags and I wonder if we should move to the server side config instead and have only one flag like --enable-s3-store or something like that

@jamengual - I thought the same thing, though was unsure of what the best structure for this would be. A structured config just for backend stores? Something which mirrors the structured config elements in the repo config?

@jamengual
Copy link
Contributor

I see all the new flags and I wonder if we should move to the server side config instead and have only one flag like --enable-s3-store or something like that

@jamengual - I thought the same thing, though was unsure of what the best structure for this would be. A structured config just for backend stores? Something which mirrors the structured config elements in the repo config?

we can do the similar to what we did for autoplan where autoplan settings can be set in the config.json if is passed, so , if enable-external-stores is enabled then it looks into the config.json for the rest of the setting, if not present, then it fails to start

@daanvinken
Copy link
Author

@jamengual @pseudomorph thanks, agreed on both points. Will update pod/k8s language to be platform agnostic.

For the config consolidation, I vouch for a single --enable-external-stores flag, with the store configuration in the server-side config:

external_stores:
  plan_store:
    type: s3
    s3:
      bucket: my-bucket
      region: us-east-1
      # optional (via AWS S3 SDK)
      prefix: atlantis/plans
      endpoint: ""
      force_path_style: false
      profile: ""

Using external_stores as the top-level key leaves room for other store types in the future (e.g. state, logs as discussed at #121) without adding more CLI flags. Startup fails if --enable-external-stores is set but the config block is missing or incomplete.

Does this structure work, or would we e.g. prefer plan_store directly at the top level?

@jamengual
Copy link
Contributor

jamengual commented Mar 25, 2026 via email

@pseudomorph
Copy link
Contributor

Seems reasonable +1 from me.

Move 7 CLI flags (--plan-store, --plan-store-s3-*) into a single
--enable-external-stores flag plus an external_stores block in the
server-side repo config YAML. This keeps S3 backend details out of
CLI args and alongside the rest of the repo-level configuration.

Signed-off-by: Daan Vinken <daanvinken@tythus.com>
Signed-off-by: Daan Vinken <dvinken@tesla.com>
@daanvinken
Copy link
Author

daanvinken commented Mar 25, 2026

Done! let me know what you think.

For context, I've cherry-picked #6295 (please review!!) onto this branch and deployed Atlantis on steroids with a clustered Redis and S3 plan backend. So it can run HA surviving full downtime/restarts without persistent disk. I've ran the same tests mentioned in the PR description.

@daanvinken
Copy link
Author

/test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies PRs that update a dependency file docs Documentation feature New functionality/enhancement github-actions go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: Write plans to S3

5 participants