feat: Plan store object storage by daanvinken · Pull Request #6312 · runatlantis/atlantis

daanvinken · 2026-03-13T15:41:16Z

what

Fixes Proposal: Write plans to S3 #265
Add S3-compatible external plan store so Terraform plan files survive pod restarts (plan on pod A, pod dies, apply on pod B)
Stale plan detection via S3 object metadata: rejects apply if PR head commit changed since plan
PR close cleanup deletes all plan objects from S3 via DeleteForPull
Fail-fast startup validation via HeadBucket
Existing LocalPlanStore behavior unchanged
CLI flags: --plan-store, --plan-store-s3-bucket, --plan-store-s3-region, --plan-store-s3-prefix, --plan-store-s3-endpoint, --plan-store-s3-force-path-style, --plan-store-s3-profile

why

Atlantis plan files live on emptyDir and don't survive pod restarts
Without this, a pod restart between plan and apply requires re-planning
With multiple replicas, the pod that receives the apply webhook may not be the one that ran plan

tests

plan + pod restart + apply succeeds (testing cluster, S3-compatible object storage)
Stale plan rejected after new commit pushed
PR close removes plan objects from S3
LocalPlanStore behavior unchanged

Plan + pod restart + apply:

{"level":"info","ts":"...","caller":"server/server.go:666","msg":"initializing S3 plan store (bucket=<bucket>, region=us-east-1)","json":{}}
{"level":"info","ts":"...","caller":"runtime/s3_plan_store.go:124","msg":"uploaded plan to s3://<bucket>/<org>/<repo>/17/default/<dir>/<planfile>","json":{}}
{"level":"info","ts":"...","caller":"events/instrumented_project_command_runner.go:91","msg":"plan success. output available at: https://<github>/<org>/<repo>/pull/17","json":{"repo":"<org>/<repo>","pull":"17"}}

Pod restarted, apply triggered:

{"level":"info","ts":"...","caller":"events/project_command_builder.go:815","msg":"pull directory missing, re-cloning repo for apply","json":{"repo":"<org>/<repo>","pull":"17"}}
{"level":"info","ts":"...","caller":"runtime/s3_plan_store.go:247","msg":"restored plan from s3://<bucket>/<org>/<repo>/17/default/<dir>/<planfile> to /atlantis-data/repos/<org>/<repo>/17/default/<dir>/<planfile>","json":{}}
{"level":"info","ts":"...","caller":"runtime/s3_plan_store.go:256","msg":"restored 1 plan(s) from S3 for <org>/<repo>#17","json":{}}
{"level":"info","ts":"...","caller":"runtime/apply_step_runner.go:75","msg":"apply successful, deleting planfile","json":{"repo":"<org>/<repo>","pull":"17"}}

Stale plan rejection:

{"level":"error","ts":"...","caller":"events/instrumented_project_command_runner.go:81","msg":"Error running apply operation: loading plan: plan in S3 has no head-commit metadata (key=<org>/<repo>/17/default/<dir>/<planfile>) — run plan again\n","json":{"repo":"<org>/<repo>","pull":"17"}}

PR close cleanup:

{"level":"info","ts":"...","caller":"events/events_controller.go:605","msg":"Pull request closed, cleaning up...","json":{"repo":"<org>/<repo>","pull":"17"}}
{"level":"info","ts":"...","caller":"runtime/s3_plan_store.go:299","msg":"deleted 1 plan(s) from S3 for <org>/<repo>#17","json":{}}

references

Proposal: Write plans to S3 #265

daanvinken · 2026-03-13T15:42:52Z

server/core/runtime/s3_plan_store.go

+		return fmt.Errorf("plan in S3 has no head-commit metadata (key=%s) — run plan again", key)
+	}
+	if ctx.Pull.HeadCommit != "" && planCommit != ctx.Pull.HeadCommit {
+		return fmt.Errorf("plan was created at commit %.8s but PR is now at %.8s — run plan again", planCommit, ctx.Pull.HeadCommit)


S3PlanStore plans survive restarts by design. Without a check, a plan from commit A gets applied after commit B is pushed. Save stores ctx.Pull.HeadCommit as S3 object metadata; Load rejects if it differs from the current PR head.

LocalPlanStore's Save/Load are no-ops (Terraform does direct disk I/O), so there's no interception point for metadata. With emptyDir this is fine; pod restart wipes plans, forcing re-plan. With a PersistentVolume the same stale plan risk exists, but that's a pre-existing upstream Atlantis gap we didn't introduce.

I'm also ok leaving this out, as it should perhaps be fixed on localPlanStore as well?

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 7 comments.

server/core/runtime/s3_plan_store.go

server/core/runtime/plan_store.go

server/user_config.go

server/core/runtime/s3_plan_store.go

server/events/project_command_builder.go

adkafka

Looks pretty good to me!

server/core/runtime/apply_step_runner.go

server/core/runtime/plan_store.go

server/core/runtime/s3_plan_store.go

Copilot

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 7 comments.

server/core/runtime/s3_plan_store.go

server/events/pull_closed_executor.go

server/user_config.go

server/core/runtime/s3_plan_store.go

daanvinken · 2026-03-16T10:50:26Z

@lukemassa @pseudomorph @GenPage @chenrui333 @nitrocode are you available to have a look here and perhaps at #6295 ? 🙏

Introduces PlanStore abstraction for plan file persistence. LocalPlanStore wraps current filesystem behavior with no functional change. Phase 2 will add S3PlanStore as a drop-in replacement to remove the PV dependency. Signed-off-by: Daan Vinken <daanvinken@tythus.com> Signed-off-by: Daan Vinken <dvinken@tesla.com>

Plan files are uploaded to S3 after plan and downloaded before apply, allowing pods to restart without losing plans. On apply, if the working directory is missing (e.g. emptyDir wiped), the repo is re-cloned and plans are restored from S3 via prefix scan. LocalPlanStore behavior is unchanged — the re-clone and restore logic only activates when an external plan store is configured. Signed-off-by: Daan Vinken <daanvinken@tythus.com> Signed-off-by: Daan Vinken <dvinken@tesla.com>

Call HeadBucket during NewS3PlanStore to fail fast on misconfigured bucket or credentials instead of silently failing on first plan. Signed-off-by: Daan Vinken <daanvinken@tythus.com> Signed-off-by: Daan Vinken <dvinken@tesla.com>

Add DeleteForPull to PlanStore interface and implement in S3PlanStore using ListObjectsV2 + DeleteObject per key. Hook into PullClosedExecutor.CleanUpPull() to prevent plan accumulation in S3. Failures are logged as warnings to avoid blocking local cleanup. Signed-off-by: Daan Vinken <daanvinken@tythus.com> Signed-off-by: Daan Vinken <dvinken@tesla.com>

Store the PR head commit SHA as S3 object metadata on Save. On Load, reject the plan if the stored commit differs from the current PR head. Plans without metadata are also rejected — forces re-plan after upgrade. Signed-off-by: Daan Vinken <daanvinken@tythus.com> Signed-off-by: Daan Vinken <dvinken@tesla.com>

Signed-off-by: Daan Vinken <dvinken@tesla.com>

pseudomorph · 2026-03-17T22:05:40Z

I'll try to give this a look in the next day or so.

server/core/runtime/s3_plan_store.go

+			}
+			localPath := filepath.Join(pullDir, relPath)
+
+			if err := os.MkdirAll(filepath.Dir(localPath), 0o700); err != nil {


server/core/runtime/s3_plan_store.go

+				return fmt.Errorf("downloading plan from S3 (key=%s): %w", key, err)
+			}
+
+			f, err := os.Create(localPath)


jamengual · 2026-03-19T16:24:48Z

I see all the new flags and I wonder if we should move to the server side config instead and have only one flag like --enable-s3-store or something like that

jamengual · 2026-03-19T16:26:16Z

also there is a lot of mention of Pods and K8s but I will like to aim this at containers in general ( just a language change), so people do not think this is just only for k8s workloads.

pseudomorph · 2026-03-19T17:50:28Z

I see all the new flags and I wonder if we should move to the server side config instead and have only one flag like --enable-s3-store or something like that

@jamengual - I thought the same thing, though was unsure of what the best structure for this would be. A structured config just for backend stores? Something which mirrors the structured config elements in the repo config?

jamengual · 2026-03-20T02:04:55Z

I see all the new flags and I wonder if we should move to the server side config instead and have only one flag like --enable-s3-store or something like that

@jamengual - I thought the same thing, though was unsure of what the best structure for this would be. A structured config just for backend stores? Something which mirrors the structured config elements in the repo config?

we can do the similar to what we did for autoplan where autoplan settings can be set in the config.json if is passed, so , if enable-external-stores is enabled then it looks into the config.json for the rest of the setting, if not present, then it fails to start

daanvinken · 2026-03-25T10:31:04Z

@jamengual @pseudomorph thanks, agreed on both points. Will update pod/k8s language to be platform agnostic.

For the config consolidation, I vouch for a single --enable-external-stores flag, with the store configuration in the server-side config:

external_stores:
  plan_store:
    type: s3
    s3:
      bucket: my-bucket
      region: us-east-1
      # optional (via AWS S3 SDK)
      prefix: atlantis/plans
      endpoint: ""
      force_path_style: false
      profile: ""

Using external_stores as the top-level key leaves room for other store types in the future (e.g. state, logs as discussed at #121) without adding more CLI flags. Startup fails if --enable-external-stores is set but the config block is missing or incomplete.

Does this structure work, or would we e.g. prefer plan_store directly at the top level?

jamengual · 2026-03-25T13:46:18Z

I like it +1 from me

…

On Wed, Mar 25, 2026, 7:31 a.m. Daan Vinken ***@***.***> wrote: *daanvinken* left a comment (runatlantis/atlantis#6312) <#6312 (comment)> @jamengual <https://github.com/jamengual> @pseudomorph <https://github.com/pseudomorph> thanks, agreed on both points. Will update pod/k8s language to be platform agnostic. For the config consolidation, I vouch for a single --enable-external-stores flag, with the store configuration in the server-side config: external_stores: plan_store: type: s3 s3: bucket: my-bucket region: us-east-1 # optional (via AWS S3 SDK) prefix: atlantis/plans endpoint: "" force_path_style: false profile: "" Using external_stores as the top-level key leaves room for other store types in the future (e.g. state, logs as discussed at #121 <#121>) without adding more CLI flags. Startup fails if --enable-external-stores is set but the config block is missing or incomplete. Does this structure work, or would we e.g. prefer plan_store directly at the top level? — Reply to this email directly, view it on GitHub <#6312?email_source=notifications&email_token=AAQ3ERBUGVQOXHDRLQN4KM34SOYP7A5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMJSGU2DCNZZGI32M4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJLDGN5XXIZLSL5RWY2LDNM#issuecomment-4125417927>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAQ3ERC5FM5XYDLI5FV3CMT4SOYP7AVCNFSM6AAAAACWRFVREGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCMRVGQYTOOJSG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

pseudomorph · 2026-03-25T15:32:25Z

Seems reasonable +1 from me.

Move 7 CLI flags (--plan-store, --plan-store-s3-*) into a single --enable-external-stores flag plus an external_stores block in the server-side repo config YAML. This keeps S3 backend details out of CLI args and alongside the rest of the repo-level configuration. Signed-off-by: Daan Vinken <daanvinken@tythus.com> Signed-off-by: Daan Vinken <dvinken@tesla.com>

daanvinken · 2026-03-25T22:08:32Z

Done! let me know what you think.

For context, I've cherry-picked #6295 (please review!!) onto this branch and deployed Atlantis on steroids with a clustered Redis and S3 plan backend. So it can run HA surviving full downtime/restarts without persistent disk. I've ran the same tests mentioned in the PR description.

daanvinken · 2026-03-25T22:09:09Z

/test

Copilot AI review requested due to automatic review settings March 13, 2026 15:41

github-actions bot added dependencies PRs that update a dependency file go Pull requests that update Go code labels Mar 13, 2026

daanvinken commented Mar 13, 2026

View reviewed changes

Copilot started reviewing on behalf of daanvinken March 13, 2026 16:00 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

daanvinken mentioned this pull request Mar 13, 2026

Proposal: Write plans to S3 #265

Open

dosubot bot added the feature New functionality/enhancement label Mar 13, 2026

daanvinken requested a review from Copilot March 13, 2026 16:15

Copilot AI reviewed Mar 13, 2026

View reviewed changes

Copilot started reviewing on behalf of daanvinken March 13, 2026 16:27 View session

adkafka approved these changes Mar 13, 2026

View reviewed changes

daanvinken requested a review from Copilot March 16, 2026 10:19

Copilot started reviewing on behalf of daanvinken March 16, 2026 10:19 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

daanvinken force-pushed the plan-store-object-storage branch from a7a123d to 9c0b6ed Compare March 16, 2026 10:43

daanvinken changed the title ~~Plan store object storage~~ feat: Plan store object storage Mar 16, 2026

daanvinken force-pushed the plan-store-object-storage branch from 9c0b6ed to 47de395 Compare March 16, 2026 12:10

github-actions bot added docs Documentation github-actions labels Mar 16, 2026

daanvinken and others added 7 commits March 16, 2026 13:11

fix: pull request finding and S3 plan store hardening

4340faa

Signed-off-by: Daan Vinken <dvinken@tesla.com>

feat: add test for external plan store recovery path

8562509

Signed-off-by: Daan Vinken <dvinken@tesla.com>

daanvinken force-pushed the plan-store-object-storage branch from 47de395 to 8562509 Compare March 16, 2026 12:15

github-advanced-security bot found potential problems Mar 19, 2026

View reviewed changes

Merge branch 'main' into plan-store-object-storage

3fd53a4

Conversation

daanvinken commented Mar 13, 2026

what

why

tests

references

Uh oh!

daanvinken Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adkafka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daanvinken commented Mar 16, 2026

Uh oh!

pseudomorph commented Mar 17, 2026

Uh oh!

Check failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jamengual commented Mar 19, 2026

Uh oh!

jamengual commented Mar 19, 2026

Uh oh!

pseudomorph commented Mar 19, 2026

Uh oh!

jamengual commented Mar 20, 2026

Uh oh!

daanvinken commented Mar 25, 2026

Uh oh!

jamengual commented Mar 25, 2026 via email

Uh oh!

pseudomorph commented Mar 25, 2026

Uh oh!

daanvinken commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daanvinken commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

daanvinken commented Mar 25, 2026 •

edited

Loading