fix: prevent JIT source TTL from wiping varfiles/backend mid-execution by Copilot · Pull Request #2253 · cloudposse/atmos

Copilot · 2026-03-26T00:46:25Z

AutoProvisionSource is called twice per command invocation — once directly from resolveAndProvisionComponentPath, and again via the before.terraform.init hook in prepareInitExecution. With ttl: "0s", the second call treats the workdir as always-expired, invokes os.RemoveAll(targetDir), and wipes the varfiles and backend configs written between the two calls. The subprocess then fails with "file does not exist".

Changes

pkg/provisioner/source/provision_hook.go — adds an in-memory idempotency guard (invocationDoneKey = "_atmos_source_provisioned") to AutoProvisionSource. A named-return defer sets the marker in componentConfig on successful return. Any subsequent call with the same map (same in-memory invocation) short-circuits immediately. The guard is scoped to the per-invocation componentConfig; separate atmos runs are unaffected.
pkg/provisioner/source/provision_hook_test.go — two regression tests:
- TestAutoProvisionSource_InvocationGuard_PreventsDoubleProvisioning: asserts the guard short-circuits a second call even with ttl: "0s"
- TestAutoProvisionSource_InvocationGuard_SetAfterProvisioning: asserts the marker is written to componentConfig after a skipped provision (TTL not expired), ensuring the hook path is a no-op

Original prompt

This section details on the original issue you should resolve

<issue_title># Bug: JIT source provisioning TTL expiry deletes varfiles/backend, then runs tofu, causing error</issue_title>
<issue_description>### Describe the Bug

When using Just-In-Time (JIT) source provisioning, the source.ttl cleanup runs concurrently with — or before — the tofu subprocess, not after it completes. If the TTL expires at any point while tofu init, tofu plan, or any other tofu command is executing, Atmos deletes the varfiles and backend configuration out from under the running process.

The most reliable way to trigger this is ttl: "0s", which expires immediately and causes a deterministic failure every time. However, any positive TTL short enough to expire before the tofu subprocess finishes (e.g. "30s" on a slow network or large module download) will produce the same failure.

The result is a hard failure from tofu because the generated varfile (and/or backend file) no longer exists on disk:
Error: Failed to read variables file
│
│ Given variables file /tmp/atmos-workdir-*/component.tfvars.json does not exist.
Expected Behavior

The TTL cleanup should be scoped to between invocations, not during one. Provisioned files should never be deleted while the subprocess that depends on them is still running. Specifically:

TTL expiry should only be evaluated before provisioning (stale cache check), not during or after subprocess execution.

The provisioned workdir should be treated as a lock for the duration of the current command — held open until the subprocess exits, then subject to TTL-based cleanup on the next invocation.

A source.ttl: "0s" is the degenerate case that makes this deterministic, but the fix must cover all TTL values.

Actual Behavior

Atmos generates the varfiles and backend, the TTL of 0s immediately expires them, Atmos wipes them, and tofu fails:
│ Error: Failed to read variables file
│ 
│ Given variables file demo-null-label.terraform.tfvars.json does not exist.
Steps to Reproduce

The script below is fully self-contained. It requires only atmos and tofu on PATH and network access to GitHub. Save it as repro.sh and run it.
#!/usr/bin/env bash
# ============================================================
# REPRO: JIT ttl:"0s" deletes varfiles before tofu can read them
# ============================================================

set -euo pipefail

WORKDIR="$(mktemp -d -t atmos-repro-XXXXXX)"
echo "Working in: ${WORKDIR}"
cd "${WORKDIR}"

# --- 1) atmos.yaml ---
cat <<'EOF' > atmos.yaml
base_path: "."

components:
  terraform:
    base_path: "components/terraform"
    command: "tofu"
    workspaces_enabled: true
    apply_auto_approve: false
    deploy_run_init: true
    init_run_reconfigure: true
    auto_generate_backend_file: true

stacks:
  name_template: "{{ .vars.name }}"
  base_path: "stacks"
  included_paths:
    - "**/*"
EOF

# --- 2) Stack with ttl: "0s" on the JIT source ---
mkdir -p stacks
cat <<'EOF' > stacks/demo.yaml
vars:
  name: demo

terraform:
  backend_type: local

components:
  terraform:
    null-label:
      vars:
        # terraform-null-label variables
        namespace: "eg"
        stage: "test"
        name: "demo"
        enabled: true
      source:
        uri: "git::https://github.com/cloudposse/terraform-null-label.git"
        version: "0.25.0"
        ttl: "0s"    # <-- triggers the bug: files are wiped before tofu reads them
      provision:
        workdir:
          enabled: true
EOF

echo
echo "== tree =="
find . -maxdepth 4 -type f -print | sed 's|^\./||'

echo
echo "== discovered stacks =="
atmos describe stacks

echo
echo "== describe component =="
atmos describe component null-label -s demo

echo
echo "== init (this is where the failure occurs with ttl:0s) =="
atmos terraform init null-label -s demo

echo
echo "== plan =="
atmos terraform plan null-label -s demo

echo "Done. Workspace preserved at: ${WORKDIR}"
Run:
bash repro.sh 2>&1 | tee repro.log
Screenshots

No response

Environment

Atmos 1.212.0 on darwin/arm64

Additional Context

No response</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes # Bug: JIT source provisioning TTL expiry deletes varfiles/backend, then runs tofu, causing error #2252

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…process execution When AutoProvisionSource is called twice within the same command invocation (once from resolveAndProvisionComponentPath and once via the before.terraform.init hook in prepareInitExecution), a zero TTL would cause VendorSource to call os.RemoveAll on the workdir, deleting varfiles and backend configs written between the two calls. Fix: add an invocationDoneKey guard that marks componentConfig after the first call completes. Any subsequent call with the same componentConfig returns immediately without re-provisioning, preserving the workdir contents for the subprocess. The defer-based marker ensures the guard is set for both the "provisioned" and "skipped" paths without duplicating code. Co-authored-by: osterman <52489+osterman@users.noreply.github.com> Agent-Logs-Url: https://github.com/cloudposse/atmos/sessions/2c2dec41-5920-40f9-9420-137f7970ff42

Initial plan

a0d342f

Copilot AI assigned Copilot and osterman Mar 26, 2026

Copilot started work on behalf of osterman March 26, 2026 00:46 View session

mergify bot added triage Needs triage wip Work in Progress: Not ready for final review or merge labels Mar 26, 2026

Copilot AI changed the title ~~[WIP] Fix JIT source provisioning TTL expiry causing varfile deletion~~ fix: prevent JIT source TTL from wiping varfiles/backend mid-execution Mar 26, 2026

Copilot AI requested a review from osterman March 26, 2026 01:00

Copilot finished work on behalf of osterman March 26, 2026 01:00

mergify bot removed the wip Work in Progress: Not ready for final review or merge label Mar 26, 2026

osterman added the patch A minor, backward compatible change label Mar 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prevent JIT source TTL from wiping varfiles/backend mid-execution#2253

fix: prevent JIT source TTL from wiping varfiles/backend mid-execution#2253
Copilot wants to merge 2 commits intomainfrom
copilot/fix-jit-source-provisioning-ttl-expiry

Copilot AI commented Mar 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Copilot AI commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Expected Behavior

Actual Behavior

Steps to Reproduce

Screenshots

Environment

Additional Context

Comments on the Issue (you are @copilot in this section)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 26, 2026 •

edited

Loading