feat: Plan store object storage#6312
Conversation
| return fmt.Errorf("plan in S3 has no head-commit metadata (key=%s) — run plan again", key) | ||
| } | ||
| if ctx.Pull.HeadCommit != "" && planCommit != ctx.Pull.HeadCommit { | ||
| return fmt.Errorf("plan was created at commit %.8s but PR is now at %.8s — run plan again", planCommit, ctx.Pull.HeadCommit) |
There was a problem hiding this comment.
S3PlanStore plans survive restarts by design. Without a check, a plan from commit A gets applied after commit B is pushed. Save stores ctx.Pull.HeadCommit as S3 object metadata; Load rejects if it differs from the current PR head.
LocalPlanStore's Save/Load are no-ops (Terraform does direct disk I/O), so there's no interception point for metadata. With emptyDir this is fine; pod restart wipes plans, forcing re-plan. With a PersistentVolume the same stale plan risk exists, but that's a pre-existing upstream Atlantis gap we didn't introduce.
I'm also ok leaving this out, as it should perhaps be fixed on localPlanStore as well?
adkafka
left a comment
There was a problem hiding this comment.
Looks pretty good to me!
a7a123d to
9c0b6ed
Compare
|
@lukemassa @pseudomorph @GenPage @chenrui333 @nitrocode are you available to have a look here and perhaps at #6295 ? 🙏 |
9c0b6ed to
47de395
Compare
Introduces PlanStore abstraction for plan file persistence. LocalPlanStore wraps current filesystem behavior with no functional change. Phase 2 will add S3PlanStore as a drop-in replacement to remove the PV dependency. Signed-off-by: Daan Vinken <daanvinken@tythus.com> Signed-off-by: Daan Vinken <dvinken@tesla.com>
Plan files are uploaded to S3 after plan and downloaded before apply, allowing pods to restart without losing plans. On apply, if the working directory is missing (e.g. emptyDir wiped), the repo is re-cloned and plans are restored from S3 via prefix scan. LocalPlanStore behavior is unchanged — the re-clone and restore logic only activates when an external plan store is configured. Signed-off-by: Daan Vinken <daanvinken@tythus.com> Signed-off-by: Daan Vinken <dvinken@tesla.com>
Call HeadBucket during NewS3PlanStore to fail fast on misconfigured bucket or credentials instead of silently failing on first plan. Signed-off-by: Daan Vinken <daanvinken@tythus.com> Signed-off-by: Daan Vinken <dvinken@tesla.com>
Add DeleteForPull to PlanStore interface and implement in S3PlanStore using ListObjectsV2 + DeleteObject per key. Hook into PullClosedExecutor.CleanUpPull() to prevent plan accumulation in S3. Failures are logged as warnings to avoid blocking local cleanup. Signed-off-by: Daan Vinken <daanvinken@tythus.com> Signed-off-by: Daan Vinken <dvinken@tesla.com>
Store the PR head commit SHA as S3 object metadata on Save. On Load, reject the plan if the stored commit differs from the current PR head. Plans without metadata are also rejected — forces re-plan after upgrade. Signed-off-by: Daan Vinken <daanvinken@tythus.com> Signed-off-by: Daan Vinken <dvinken@tesla.com>
Signed-off-by: Daan Vinken <dvinken@tesla.com>
Signed-off-by: Daan Vinken <dvinken@tesla.com>
47de395 to
8562509
Compare
|
I'll try to give this a look in the next day or so. |
| } | ||
| localPath := filepath.Join(pullDir, relPath) | ||
|
|
||
| if err := os.MkdirAll(filepath.Dir(localPath), 0o700); err != nil { |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
| return fmt.Errorf("downloading plan from S3 (key=%s): %w", key, err) | ||
| } | ||
|
|
||
| f, err := os.Create(localPath) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
|
I see all the new flags and I wonder if we should move to the server side config instead and have only one flag like --enable-s3-store or something like that |
|
also there is a lot of mention of Pods and K8s but I will like to aim this at containers in general ( just a language change), so people do not think this is just only for k8s workloads. |
@jamengual - I thought the same thing, though was unsure of what the best structure for this would be. A structured config just for backend stores? Something which mirrors the structured config elements in the repo config? |
we can do the similar to what we did for autoplan where autoplan settings can be set in the config.json if is passed, so , if enable-external-stores is enabled then it looks into the config.json for the rest of the setting, if not present, then it fails to start |
|
@jamengual @pseudomorph thanks, agreed on both points. Will update pod/k8s language to be platform agnostic. For the config consolidation, I vouch for a single external_stores:
plan_store:
type: s3
s3:
bucket: my-bucket
region: us-east-1
# optional (via AWS S3 SDK)
prefix: atlantis/plans
endpoint: ""
force_path_style: false
profile: ""Using Does this structure work, or would we e.g. prefer |
|
I like it +1 from me
…On Wed, Mar 25, 2026, 7:31 a.m. Daan Vinken ***@***.***> wrote:
*daanvinken* left a comment (runatlantis/atlantis#6312)
<#6312 (comment)>
@jamengual <https://github.com/jamengual> @pseudomorph
<https://github.com/pseudomorph> thanks, agreed on both points. Will
update pod/k8s language to be platform agnostic.
For the config consolidation, I vouch for a single
--enable-external-stores flag, with the store configuration in the
server-side config:
external_stores:
plan_store:
type: s3
s3:
bucket: my-bucket
region: us-east-1
# optional (via AWS S3 SDK)
prefix: atlantis/plans
endpoint: ""
force_path_style: false
profile: ""
Using external_stores as the top-level key leaves room for other store
types in the future (e.g. state, logs as discussed at #121
<#121>) without adding more
CLI flags. Startup fails if --enable-external-stores is set but the
config block is missing or incomplete.
Does this structure work, or would we e.g. prefer plan_store directly at
the top level?
—
Reply to this email directly, view it on GitHub
<#6312?email_source=notifications&email_token=AAQ3ERBUGVQOXHDRLQN4KM34SOYP7A5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMJSGU2DCNZZGI32M4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJLDGN5XXIZLSL5RWY2LDNM#issuecomment-4125417927>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQ3ERC5FM5XYDLI5FV3CMT4SOYP7AVCNFSM6AAAAACWRFVREGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCMRVGQYTOOJSG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Seems reasonable +1 from me. |
Move 7 CLI flags (--plan-store, --plan-store-s3-*) into a single --enable-external-stores flag plus an external_stores block in the server-side repo config YAML. This keeps S3 backend details out of CLI args and alongside the rest of the repo-level configuration. Signed-off-by: Daan Vinken <daanvinken@tythus.com> Signed-off-by: Daan Vinken <dvinken@tesla.com>
|
Done! let me know what you think. For context, I've cherry-picked #6295 (please review!!) onto this branch and deployed Atlantis on steroids with a clustered Redis and S3 plan backend. So it can run HA surviving full downtime/restarts without persistent disk. I've ran the same tests mentioned in the PR description. |
|
/test |
what
DeleteForPullHeadBucketLocalPlanStorebehavior unchanged--plan-store,--plan-store-s3-bucket,--plan-store-s3-region,--plan-store-s3-prefix,--plan-store-s3-endpoint,--plan-store-s3-force-path-style,--plan-store-s3-profilewhy
emptyDirand don't survive pod restartsplanandapplyrequires re-planningtests
plan+ pod restart +applysucceeds (testing cluster, S3-compatible object storage)LocalPlanStorebehavior unchangedPlan + pod restart + apply:
{"level":"info","ts":"...","caller":"server/server.go:666","msg":"initializing S3 plan store (bucket=<bucket>, region=us-east-1)","json":{}} {"level":"info","ts":"...","caller":"runtime/s3_plan_store.go:124","msg":"uploaded plan to s3://<bucket>/<org>/<repo>/17/default/<dir>/<planfile>","json":{}} {"level":"info","ts":"...","caller":"events/instrumented_project_command_runner.go:91","msg":"plan success. output available at: https://<github>/<org>/<repo>/pull/17","json":{"repo":"<org>/<repo>","pull":"17"}}Pod restarted, apply triggered:
{"level":"info","ts":"...","caller":"events/project_command_builder.go:815","msg":"pull directory missing, re-cloning repo for apply","json":{"repo":"<org>/<repo>","pull":"17"}} {"level":"info","ts":"...","caller":"runtime/s3_plan_store.go:247","msg":"restored plan from s3://<bucket>/<org>/<repo>/17/default/<dir>/<planfile> to /atlantis-data/repos/<org>/<repo>/17/default/<dir>/<planfile>","json":{}} {"level":"info","ts":"...","caller":"runtime/s3_plan_store.go:256","msg":"restored 1 plan(s) from S3 for <org>/<repo>#17","json":{}} {"level":"info","ts":"...","caller":"runtime/apply_step_runner.go:75","msg":"apply successful, deleting planfile","json":{"repo":"<org>/<repo>","pull":"17"}}Stale plan rejection:
{"level":"error","ts":"...","caller":"events/instrumented_project_command_runner.go:81","msg":"Error running apply operation: loading plan: plan in S3 has no head-commit metadata (key=<org>/<repo>/17/default/<dir>/<planfile>) — run plan again\n","json":{"repo":"<org>/<repo>","pull":"17"}}PR close cleanup:
{"level":"info","ts":"...","caller":"events/events_controller.go:605","msg":"Pull request closed, cleaning up...","json":{"repo":"<org>/<repo>","pull":"17"}} {"level":"info","ts":"...","caller":"runtime/s3_plan_store.go:299","msg":"deleted 1 plan(s) from S3 for <org>/<repo>#17","json":{}}references