perf(merge): replace mergo pre-copy loop with reflection-free native deep merge (3.5× faster) by Copilot · Pull Request #2201 · cloudposse/atmos

Copilot · 2026-03-15T01:15:52Z

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Summary by CodeRabbit

New Features
- ~3.5× faster deep-merge for stack configuration resolution (native implementation)
- Improved Terraform workspace detection and more tolerant recovery behavior
Bug Fixes
- Fixed slice-merge precedence and eliminated unintended data aliasing/corruption
- Resolved workspace/state edge cases and clarified recovery logging
Tests
- Strengthened test coverage with runtime gating, negative-path checks, and cross-validation opt‑ins
Documentation
- Added detailed deep-merge blog post and a release notes/fixes page documenting behavior and migration guidance

…deep merge (3.5× faster) The previous MergeWithOptions implementation called DeepCopyMap on every input before passing it to mergo.Merge, paying two costs per step: a full pre-copy of the source map (including keys that would never conflict) plus reflection overhead inside mergo. The new implementation: - Deep-copies only the first input to seed the accumulator. - Merges each subsequent input via deepMergeNative, which copies values only when they are stored as leaves in the accumulator. Shared intermediate map[string]any containers are recursed into without any allocation. - Contains no reflection; every type switch operates directly on map[string]any and []any (the only types present after YAML parsing). Benchmark result (3-input merge, AMD EPYC 7763): Before: 682 k iter / 5062 ns/op After: 2514 k iter / 1427 ns/op ← 3.5× faster All three list merge strategies (replace, append, merge) and the WithSliceDeepCopy scalar-preservation behaviour are preserved and verified by the full existing test suite plus 23 new dedicated tests. Co-authored-by: nitrocode <7775707+nitrocode@users.noreply.github.com>

github-actions · 2026-03-15T01:41:35Z

Dependency Review

✅ No vulnerabilities or license issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 0858696.

Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

pkg/merge/merge_native.go

…tHub Advanced Security alerts 5236-5239) Co-authored-by: nitrocode <7775707+nitrocode@users.noreply.github.com>

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (2)

internal/exec/yaml_func_template_test.go (1)

439-441: Add terraform fallback for consistency.

These subtests only check for tofu, but the "atmos.Component integration" subtest at lines 397-401 checks both tofu and terraform. Consider using the same fallback pattern here for consistency and broader compatibility.

🔧 Proposed fix for lines 439-441

-		if _, err := exec.LookPath("tofu"); err != nil {
-			t.Skip("skipping: 'tofu' binary not found in PATH (required for atmos.Component integration test)")
-		}
+		_, tofuErr := exec.LookPath("tofu")
+		_, tfErr := exec.LookPath("terraform")
+		if tofuErr != nil && tfErr != nil {
+			t.Skip("skipping: neither 'tofu' nor 'terraform' binary found in PATH (required for atmos.Component integration test)")
+		}

🔧 Proposed fix for lines 466-468

-		if _, err := exec.LookPath("tofu"); err != nil {
-			t.Skip("skipping: 'tofu' binary not found in PATH (required for atmos.Component integration test)")
-		}
+		_, tofuErr := exec.LookPath("tofu")
+		_, tfErr := exec.LookPath("terraform")
+		if tofuErr != nil && tfErr != nil {
+			t.Skip("skipping: neither 'tofu' nor 'terraform' binary found in PATH (required for atmos.Component integration test)")
+		}

Also applies to: 466-468

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/exec/yaml_func_template_test.go` around lines 439 - 441, The test
currently skips when only the "tofu" binary is missing; change the check to try
exec.LookPath for both "tofu" and "terraform" and only call t.Skip if neither is
found—i.e., attempt exec.LookPath("tofu") then exec.LookPath("terraform") and
proceed if either succeeds, otherwise t.Skip with an updated message mentioning
both binaries; apply the same change to the other occurrence that checks "tofu"
(the block at the second instance around lines 466-468).

pkg/merge/merge_native_test.go (1)

176-182: Prefer sentinel checks over error-string matching.

These cases pin message text even though the production code already returns or wraps stable merge sentinels. assert.ErrorIs / require.ErrorIs would keep the tests resilient to added context and align with the repo rule.

As per coding guidelines, "**/*_test.go: ... use errors.Is() for error checking."

Also applies to: 441-448, 687-691, 703-705, 727-741, 1082-1085
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/merge/merge_native_test.go` around lines 176 - 182, The test for
deepMergeNative should use errors.Is-style checking against the sentinel error
the production code returns for a nil destination instead of matching the error
string; replace the assert.Contains(err.Error(), "must not be nil") with
require.ErrorIs(err, ErrNilDst) (or the actual sentinel name used by the merge
package, e.g., mergeSentinelNilDst) so the test calls require.ErrorIs(err,
<sentinel>) after calling deepMergeNative(nil, ...).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CLAUDE.md`:
- Around line 406-418: The guidance mistakenly suggests adding a standalone
TestMain which will cause duplicate TestMain compile errors in packages that
already define one; update the text to explicitly instruct contributors to
modify/extend the existing TestMain in their package (not add a second function)
by inserting the env-gate check for "_ATMOS_TEST_EXIT_ONE" and the early
os.Exit(1) branch into the package's current TestMain implementation or test
bootstrap (refer to the TestMain symbol and the "_ATMOS_TEST_EXIT_ONE" env var),
and give a brief example wording telling contributors to merge the snippet into
their existing TestMain rather than copying it verbatim.

In `@internal/exec/terraform_execute_helpers_pipeline_test.go`:
- Around line 169-200: Update TestRunWorkspaceSetup_RecoveryPath to assert the
specific warning message is emitted (e.g., check the captured log contains
"Workspace is already active") instead of only asserting nil; capture the logger
output by swapping the global logger/writer before calling runWorkspaceSetup and
after the call restore the original logger writer (store the previous writer in
a variable and restore it, do not hardcode os.Stderr) so logger state is not
leaked to other tests; apply the same fixes to the other test at the same
pattern (lines ~229-232) that restores the logger incorrectly.

In `@internal/exec/terraform_utils_test.go`:
- Around line 1827-1835: The test only covers an absolute TF_DATA_DIR path; add
a second subcase that sets a relative TF_DATA_DIR to "custom-tf-dir" and creates
the environment file under filepath.Join(dir, "custom-tf-dir") so
isTerraformCurrentWorkspace(dir, "nonprod") resolves correctly; modify the t.Run
"respects TF_DATA_DIR env var" to include (or add a new t.Run) where you call
t.Setenv("TF_DATA_DIR", "custom-tf-dir") and write the environment file at
filepath.Join(dir, "custom-tf-dir", "environment") before asserting
assert.True(t, isTerraformCurrentWorkspace(dir, "nonprod")).

In `@internal/exec/validate_stacks_test.go`:
- Around line 94-103: The WalkDir callback in validate_stacks_test.go currently
ignores the error parameter and calls d.IsDir() unguarded which can panic;
modify the filepath.WalkDir callback used to count fixtureFileCount so it checks
the incoming error first (if err != nil return err) and also ensure d is non-nil
before calling d.IsDir(); keep the existing logic that increments
fixtureFileCount for non-dir .yaml files and let WalkDir propagate any directory
read errors by returning the error from the callback so the require.Positive
guard remains meaningful.

In `@pkg/merge/merge_native.go`:
- Around line 53-77: Move the slice/type-mismatch guard earlier in
deepMergeNative so that you check for a conflicting dst slice before handling
map branches: when processing keys in deepMergeNative (symbols: srcVal, dstVal,
dst[k]), detect if dstVal is []any and srcVal is a map and return
ErrMergeTypeMismatch instead of letting the map-handling branches (the fast-path
map branch and the normalize-via-deepCopyValue branch) overwrite dst; ensure
this guard runs before the map-specific branches (the code paths that call
deepMergeNative recursively) so shape changes from slice→map are rejected
consistently (also apply the same reordering to the later similar section around
the other map/type handling).

In `@pkg/merge/merge_no_duplicate_errors_test.go`:
- Around line 93-97: Replace the loose assert.Error check with an assertion that
the returned error wraps the merge sentinel (e.g., use errors.Is against the
package sentinel such as ErrTypeMismatch or the appropriate merge sentinel) so
the test verifies the exact error chain from Merge; add the standard "errors"
import (and errUtils if your project uses it for wrapping helpers) and change
the assertion to something like assert.True(t, errors.Is(err,
merge.ErrTypeMismatch), ...) while keeping the existing assert.Empty check for
stderrOutput.

In `@website/blog/2026-03-15-faster-deep-merge.mdx`:
- Around line 90-94: The blog post contains a placeholder tracking note claiming
a follow-up issue exists for the mergo migration but no issue number is
provided; before publishing, either update that paragraph to link to the actual
follow-up issue number (replacing the sentence that references "[PR `#2201`
description]" and the line "Once the follow-up issue is opened, this sentence
will be updated with its `#number`.") or remove/neutralize the claim about a
tracked follow-up so the post isn’t shipping stale information; if helpful,
create the follow-up issue now and insert its number into that paragraph
referencing the mergo migration and CVE removal.
- Around line 73-80: Update the website paragraph to state that the behavior of
truncating extra source elements when sliceDeepCopy is active is an intentional
divergence from mergo, matching the implementation in merge_native and the
expectations asserted in merge_compare_mergo_test; replace the sentence that
claims this was "also the observed mergo behavior" with a clear statement that
the native implementation intentionally adopts this contract (and link to the
test/implementation by name: merge_native and merge_compare_mergo_test) so
website docs and code/tests present the same, consistent contract for
sliceDeepCopy.

In `@website/src/data/roadmap.js`:
- Line 449: The roadmap entry for the shipped milestone "3.5× faster deep merge
(hot-path)" is added but the initiative progress wasn't updated; update the
initiative's quality.progress value from 80 to 86 to reflect 12 of 14 milestones
shipped, and ensure the new milestone object (the entry with label "3.5× faster
deep merge (hot-path)", status: 'shipped', pr: 2201, changelog:
'faster-deep-merge') is included under the same initiative so progress and
changelog/pr data are consistent with the shipping policy.

---

Nitpick comments:
In `@internal/exec/yaml_func_template_test.go`:
- Around line 439-441: The test currently skips when only the "tofu" binary is
missing; change the check to try exec.LookPath for both "tofu" and "terraform"
and only call t.Skip if neither is found—i.e., attempt exec.LookPath("tofu")
then exec.LookPath("terraform") and proceed if either succeeds, otherwise t.Skip
with an updated message mentioning both binaries; apply the same change to the
other occurrence that checks "tofu" (the block at the second instance around
lines 466-468).

In `@pkg/merge/merge_native_test.go`:
- Around line 176-182: The test for deepMergeNative should use errors.Is-style
checking against the sentinel error the production code returns for a nil
destination instead of matching the error string; replace the
assert.Contains(err.Error(), "must not be nil") with require.ErrorIs(err,
ErrNilDst) (or the actual sentinel name used by the merge package, e.g.,
mergeSentinelNilDst) so the test calls require.ErrorIs(err, <sentinel>) after
calling deepMergeNative(nil, ...).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f0de36a0-a628-439f-a205-f501e0a08e60

📥 Commits

Reviewing files that changed from the base of the PR and between fe8ebf3 and 1f2cf50.

📒 Files selected for processing (29)

CLAUDE.md
docs/fixes/2026-03-19-deep-merge-native-fixes.md
errors/errors.go
internal/exec/terraform_clean_test.go
internal/exec/terraform_execute_helpers_exec.go
internal/exec/terraform_execute_helpers_pipeline_test.go
internal/exec/terraform_test.go
internal/exec/terraform_utils.go
internal/exec/terraform_utils_test.go
internal/exec/testmain_test.go
internal/exec/validate_stacks_test.go
internal/exec/workflow_identity_test.go
internal/exec/workflow_utils_test.go
internal/exec/yaml_func_template_test.go
internal/exec/yaml_func_terraform_output_test.go
internal/exec/yaml_func_terraform_state_test.go
internal/exec/yaml_func_terraform_state_workspaces_disabled_test.go
internal/exec/yaml_func_utils_test.go
pkg/merge/merge.go
pkg/merge/merge_compare_mergo_test.go
pkg/merge/merge_context_demo_test.go
pkg/merge/merge_context_test.go
pkg/merge/merge_native.go
pkg/merge/merge_native_test.go
pkg/merge/merge_no_duplicate_errors_test.go
tests/preconditions.go
website/blog/2026-03-15-faster-deep-merge.mdx
website/blog/authors.yml
website/src/data/roadmap.js

CLAUDE.md

internal/exec/terraform_execute_helpers_pipeline_test.go

internal/exec/terraform_utils_test.go

internal/exec/validate_stacks_test.go

pkg/merge/merge_native.go

pkg/merge/merge_no_duplicate_errors_test.go

website/blog/2026-03-15-faster-deep-merge.mdx

website/src/data/roadmap.js

…docs cleanup Co-authored-by: nitrocode <7775707+nitrocode@users.noreply.github.com> Agent-Logs-Url: https://github.com/cloudposse/atmos/sessions/4afff323-136a-450a-b021-1ef1bcca14fe

…s, mergo issue link, CLAUDE.md compare_mergo command Co-authored-by: nitrocode <7775707+nitrocode@users.noreply.github.com> Agent-Logs-Url: https://github.com/cloudposse/atmos/sessions/2a4fda27-af54-42b0-a99c-111464ba91d9

…; update benchmark, compare_mergo header, and TF_DATA_DIR comment Co-authored-by: nitrocode <7775707+nitrocode@users.noreply.github.com> Agent-Logs-Url: https://github.com/cloudposse/atmos/sessions/93ee2135-1e5b-487d-94f5-e3af9cc1f548

…de review Co-authored-by: nitrocode <7775707+nitrocode@users.noreply.github.com> Agent-Logs-Url: https://github.com/cloudposse/atmos/sessions/93ee2135-1e5b-487d-94f5-e3af9cc1f548

coderabbitai · 2026-03-23T13:33:04Z

Tip

For best results, initiate chat on the files or code changes.

@nitrocode

Audit round number: 12

Executive Summary
This PR delivers a materially faster deep-merge on the hot path and adds a pragmatic Terraform workspace recovery. Earlier critical regressions (double execution, slice aliasing, default-workspace handling) appear fixed. The largest remaining risk is durability: there’s no guardrail test guaranteeing ExecuteTerraform runs the pipeline exactly once, and the recovery path’s exit-code wrapping contract isn’t pinned by a unit test. A few narrow branches remain untested and the last mergo usage lacks a linked follow-up, but these are straightforward to close. Recommendation: land the top two guardrail tests and link the mergo follow-up; then merging is low risk.
Findings Table
| Number # | Issue | Severity | Pre-existing? | Suggested Fix |
|---|---|---|---|---|
| 1 | No test guaranteeing ExecuteTerraform invokes the pipeline exactly once (future double-exec guardrail) | 🔴 Critical | No | Add a test-only spy/counter around ExecuteShellCommand (DI seam or build tag) and assert exactly one invocation per ExecuteTerraform() call. |
| 2 | Exit-code wrapping contract not pinned (recovery assumes errors.As(err, ExitCodeError)) | 🔴 Critical | No | Add a sub-test that runs os.Executable() with _ATMOS_TEST_EXIT_ONE=1 and assert errors.As(err, errUtils.ExitCodeError). |
| 3 | Remaining runtime mergo usage in non-hot paths keeps dependency/CVE surface | 🟠 High | Yes | Open a follow-up issue and link its # in blog/roadmap; port pkg/merge/merge_yaml_functions.go and pkg/devcontainer/config_loader.go to native override semantics; drop mergo from go.mod. |
| 4 | validate_stacks_test fixture walk can silently degrade if path drift yields zero YAMLs | 🟠 High | No | After WalkDir, require.Positive(t, fixtureFileCount, "stacks fixtures missing — check testCasesPath") and derive path via runtime.Caller(0) to be CWD-independent. |
| 5 | Patch coverage shortfall on small but important branches (merge.go, merge_native.go, terraform_execute_helpers_exec.go) | 🟠 High | No | Add micro-tests: (a) MergeWithOptions_EmptyInputs + flag-wiring; (b) DeepMergeNative typed-slice replace path; (c) ExecuteMainTerraformCommand error propagation. |
| 6 | Negative recovery path test does not assert that Warn is NOT emitted when .terraform/environment mismatches | 🟡 Medium | No | Capture/hook logger in TestRunWorkspaceSetup_NoRecoveryOnMismatchedEnv; assert no Warn; still expect non‑nil error. |
| 7 | “Cross‑validate vs mergo” is opt‑in and not advertised (parity drift risk) | 🟡 Medium | Yes | Add a header note and CLAUDE.md snippet: go test -tags compare_mergo ./pkg/merge -run CrossValidate -v; cite mergo version. |
| 8 | Production‑scale benchmark may be unrepresentative of large stacks | 🟡 Medium | Yes | Ensure BenchmarkMerge_ProductionScale uses ≥10 inputs, ≥20 top-level keys, nested list‑of‑map‑of‑list; publish those numbers in the blog alongside micro-benchmarks. |
| 9 | Networked GHCR precondition probe can flake and may expose headers in logs | 🟡 Medium | No | Use short-timeout http.Client, redact Authorization in any logs, and skip on dial/timeouts; add table-driven tests for 401/403/unreachable via httptest.Server. |
| 10 | Error surface changed (“mergo merge failed:” → “ErrMerge: ”) may break string‑matching consumers | 🟢 Low | No | Note in docs/fixes; prefer errors.Is/As in tests; add a release-note callout. |
| 11 | TF_DATA_DIR absolute‑path semantics under‑documented | 🟢 Low | Yes | Amend isTerraformCurrentWorkspace comment to state Terraform parity: relative resolved from cwd; absolute used directly. |
| 12 | Tests reference schema.Provider.Kind; schema drift could silently break intent | 🟢 Low | Partial | Add compile-guard in the same _test.go: var _ = schema.Provider{Kind: "azure"} to fail fast on rename. |
Scores Table
| Score | Number | Grade | Delta vs previous |
|---|---|---|---|
| Merge Safety | 96/100 | A | 0 |
| Test Coverage | 78/100 | B- | +1 |
| Code Quality | 93/100 | A- | 0 |
| Security Posture | 92/100 | A- | 0 |
Top 3 Copilot Action Items

Add guardrail tests: (1) ExecuteTerraform_single_invocation (spy/counter; assert exactly one ExecuteShellCommand call), (2) ExecuteShellCommand_exit_wrapping_contract (assert errors.As(err, errUtils.ExitCodeError) under _ATMOS_TEST_EXIT_ONE=1).
Close patch coverage gaps with three micro-tests: MergeWithOptions_EmptyInputs_and_FlagWiring; DeepMergeNative_TypedSrcSliceReplace_AnyDst; ExecuteMainTerraformCommand_Error_Propagates.
Open and link the mergo‑migration follow‑up issue; update blog and roadmap entries with the concrete GitHub issue number.

Suggested fix prompt

You are Copilot working on cloudposse/atmos PR `#2201`. Apply these changes in priority order:

1) Guardrail tests (durability)
- Add internal/exec/terraform_execute_single_invocation_test.go:
  * Introduce a test-only spy/counter around ExecuteShellCommand (tiny DI seam or build tag).
  * Call ExecuteTerraform() once and assert the spy counter == 1.
- Add internal/exec/terraform_execute_exit_wrapping_test.go:
  * Invoke ExecuteShellCommand(os.Executable(), env ["_ATMOS_TEST_EXIT_ONE=1"]).
  * Assert errors.As(err, errUtils.ExitCodeError).

2) Patch coverage micro-tests
- pkg/merge/merge_test.go:
  * TestMergeWithOptions_EmptyInputs_ReturnsEmptyMap.
  * TestMergeWithOptions_StrategyFlags_WireThrough (append vs sliceDeepCopy).
- pkg/merge/merge_native_test.go:
  * TestDeepMergeNative_TypedSrcSliceReplace_AnyDst (typed []string replaces []any in replace mode).
- internal/exec/terraform_execute_helpers_exec_test.go:
  * TestExecuteMainTerraformCommand_Error_Propagates (exercise untested error path).

3) validate_stacks_test hardening
- Build absolute fixture path via runtime.Caller(0).
- require.Positive(t, fixtureFileCount, "stacks fixtures missing — check testCasesPath").
- Add a self-bound for the block counter:
  if strings.Count(errStr, "File being processed:") > fileCount+1 { t.Errorf(...) }.

4) Workspace recovery negative-path logging
- In TestRunWorkspaceSetup_NoRecoveryOnMismatchedEnv, hook/capture logger; assert no Warn; still expect non-nil error.

5) Advertise parity checks
- In pkg/merge/merge_compare_mergo_test.go header and CLAUDE.md, add:
  go test -tags compare_mergo ./pkg/merge -run CrossValidate -v
  and record the mergo version used for parity.

6) Production-scale benchmarks and blog
- Expand BenchmarkMerge_ProductionScale to ≥10 inputs × ≥20 keys with nested list-of-map-of-list.
- Update website/blog/2026-03-15-faster-deep-merge.mdx to include those numbers alongside micro-benchmarks.

7) Mergo follow-up linkage
- Open a GH issue “Migrate remaining mergo call-sites (merge_yaml_functions.go, devcontainer/config_loader.go) to native; drop mergo.”
- Paste the # into the blog post and website/src/data/roadmap.js.

8) Minor polish
- isTerraformCurrentWorkspace comment: explicitly state TF_DATA_DIR parity with Terraform.
- Add a compile-guard near tests referencing schema.Provider.Kind:
  var _ = schema.Provider{Kind: "azure"} // compile-guard.
- Docs/fixes summary: include internal/exec/testmain_test.go and both recovery tests.

Run golangci-lint run ./... and make test; ensure patch coverage ≈100% on changed files.

✏️ Learnings added

Learnt from: nitrocode
URL: https://github.com/cloudposse/atmos/pull/2201

Timestamp: 2026-03-23T13:33:02.311Z
Learning: When a PR depends on ExecuteShellCommand returning exit codes in a specific wrapper type (errUtils.ExitCodeError), always add a unit test that asserts errors.As(err, errUtils.ExitCodeError) so future refactors can’t silently break recovery paths.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: osterman
Repo: cloudposse/atmos PR: 1686
File: errors/errors.go:184-203
Timestamp: 2025-12-13T06:10:13.688Z
Learning: cloudposse/atmos: For toolchain work, duplicate/unused error sentinels in errors/errors.go should be cleaned up in a separate refactor PR and not block feature PRs; canonical toolchain sentinels live under toolchain/registry with re-exports in toolchain/errors.go.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 1261
File: internal/exec/utils.go:639-640
Timestamp: 2025-05-22T15:42:10.906Z
Learning: In the Atmos codebase, when appending slices with `args := append(configAndStacksInfo.CliArgs, configAndStacksInfo.AdditionalArgsAndFlags...)`, it's intentional that the result is not stored back in the original slice. This pattern is used when the merged result serves a different purpose than the original slices, such as when creating a filtered version for component section assignments.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 2215
File: pkg/config/config.go:250-252
Timestamp: 2026-03-18T14:35:09.584Z
Learning: In cloudposse/atmos `pkg/config/config.go`, bare relative paths (e.g., "stacks", "foo/bar") are source-independent per PRD FR4: they always go through git root search regardless of whether the source is config-file, env var (ATMOS_BASE_PATH), or CLI flag (--base-path). Git root takes priority when both <git-root>/<path> and <cwd>/<path> exist. This is intentional — the same bare value should resolve to the same location regardless of where it was specified. The `os.Stat` fallback in `tryResolveWithGitRoot` handles cases where the git-root path doesn't exist. Users who want explicit CWD-relative behavior must use a dot-prefix (e.g., `./stacks` instead of `stacks`). Do not flag this as a bug in code reviews.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 759
File: internal/exec/terraform.go:366-368
Timestamp: 2024-11-02T15:35:09.958Z
Learning: In `internal/exec/terraform.go`, the workspace cleaning code under both the general execution path and within the `case "init":` block is intentionally duplicated because the code execution paths are different. The `.terraform/environment` file should be deleted before executing `terraform init` in both scenarios to ensure a clean state.

Learnt from: osterman
Repo: cloudposse/atmos PR: 727
File: internal/exec/terraform_clean.go:405-412
Timestamp: 2024-10-30T13:25:45.965Z
Learning: In `internal/exec/terraform_clean.go`, when appending `stackFolders` to `folders` in the `handleCleanSubCommand` function, it's unnecessary to check if `stackFolders` is nil before appending, because in Go, appending a nil slice is safe and does not cause a panic.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-22T04:19:23.596Z
Learning: cloudposse/atmos: pkg/filesystem exposes test-only ResetGlobMatchesCache() and ResetPathMatchCache() helpers; use these (with t.Cleanup) instead of direct global var assignment to avoid data races and inter-test coupling.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-22T13:00:09.560Z
Learning: cloudposse/atmos: GetGlobMatches has intentionally divergent semantics across packages — pkg/filesystem.GetGlobMatches always returns a non-nil slice (empty on no matches), while pkg/utils.GetGlobMatches treats no matches as an error or nil depending on doublestar.Glob result. Tests and docs should call this out to avoid confusion in reviews.

Learnt from: osterman
Repo: cloudposse/atmos PR: 740
File: cmd/cmd_utils.go:340-359
Timestamp: 2024-10-23T21:36:40.262Z
Learning: In the Go codebase for Atmos, when reviewing functions like `checkAtmosConfig` in `cmd/cmd_utils.go`, avoid suggesting refactoring to return errors instead of calling `os.Exit` if such changes would significantly increase the scope due to the need to update multiple call sites.

Learnt from: haitham911
Repo: cloudposse/atmos PR: 727
File: internal/exec/terraform_clean.go:407-416
Timestamp: 2024-11-24T19:13:10.287Z
Learning: In `internal/exec/terraform_clean.go`, when `getStackTerraformStateFolder` returns an error in the `handleCleanSubCommand` function, the error is logged, and the process continues without returning the error.

Learnt from: Cerebrovinny
Repo: cloudposse/atmos PR: 1560
File: pkg/utils/string_utils.go:43-64
Timestamp: 2025-09-30T19:03:50.738Z
Learning: In the Atmos codebase, YAML tags like !terraform.output rely on positional arguments, so the SplitStringByDelimiter function in pkg/utils/string_utils.go must preserve empty strings (even after trimming quotes) to maintain the correct number of positional arguments. Filtering out empty values after trimming would collapse the array and break these function calls.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 775
File: internal/exec/template_funcs_component.go:157-159
Timestamp: 2024-11-12T03:16:02.910Z
Learning: In the Go code for `componentFunc` in `internal/exec/template_funcs_component.go`, the function `cleanTerraformWorkspace` does not return errors, and it's acceptable if the file does not exist. Therefore, error handling for `cleanTerraformWorkspace` is not needed.

Learnt from: osterman
Repo: cloudposse/atmos PR: 795
File: internal/exec/stack_processor_utils.go:378-386
Timestamp: 2024-11-19T23:00:45.899Z
Learning: In the `ProcessYAMLConfigFile` function within `internal/exec/stack_processor_utils.go`, directory traversal in stack imports is acceptable and should not be restricted.

Learnt from: CR
Repo: cloudposse/atmos PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T22:00:12.699Z
Learning: Applies to **/*_test.go : Test behavior, not implementation; never test stub functions (implement or remove); avoid tautological tests; make code testable via DI; no coverage theater; remove always-skipped tests; use `errors.Is()` for error checking

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-19T04:02:15.601Z
Learning: In cloudposse/atmos, pkg/config/homedir now exposes unexported DI hooks for testing (currentUserFunc, darwinHomeDirFunc, shellHomeDirCmd). Tests that mutate these must not use t.Parallel() and should always restore originals via defer to avoid cross-test interference.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 1352
File: pkg/store/artifactory_store_test.go:108-113
Timestamp: 2025-08-15T14:43:41.030Z
Learning: In test files for the atmos project, it's acceptable to ignore errors from os.Setenv/Unsetenv operations during test environment setup and teardown, as these are controlled test scenarios.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-23T04:09:45.144Z
Learning: When adding a recovery path that depends on ExecuteShellCommand exit-code semantics, always include a unit test that asserts errors.As(err, errUtils.ExitCodeError) so future refactors cannot break the wrapping contract unnoticed.

Learnt from: osterman
Repo: cloudposse/atmos PR: 727
File: internal/exec/terraform_clean.go:329-332
Timestamp: 2024-10-28T01:51:30.811Z
Learning: In the Atmos Go code, when deleting directories or handling file paths (e.g., in `terraform_clean.go`), always resolve the absolute path using `filepath.Abs` and use the logger `u.LogWarning` for logging messages instead of using `fmt.Printf`.

Learnt from: haitham911
Repo: cloudposse/atmos PR: 727
File: internal/exec/terraform_clean.go:314-319
Timestamp: 2024-10-27T04:54:32.397Z
Learning: When deleting empty folders in the `deleteFolders` function, handling errors from `os.Remove` are not required, as failures do not affect the process.

Learnt from: haitham911
Repo: cloudposse/atmos PR: 727
File: internal/exec/terraform_clean.go:215-223
Timestamp: 2024-10-27T04:41:49.199Z
Learning: In `internal/exec/terraform_clean.go`, the function `determineCleanPath` is necessary and should not be removed.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-18T23:56:50.624Z
Learning: In cloudposse/atmos, when auth.TerraformPreHook fails in internal/exec/terraform_execute_helpers_exec.go (prepareComponentExecution function), execution is TERMINATED by returning the error. This is the correct behavior — the code comment reads "Pre-hook failures terminate execution — this matches the original terraform.go behavior. Authentication setup failures must not silently produce unauthenticated terraform commands." The previously stored learning (ID 11a3bc42) that said "execution continues" was stale and incorrect. The original terraform.go always returned on TerraformPreHook error.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-23T04:53:54.105Z
Learning: When nitrocode requests an audit on cloudposse/atmos PRs, respond with: (1) a 3–5 sentence Executive Summary, (2) a Findings table with columns “Number # | Issue | Severity | Pre-existing? | Suggested Fix” using 🔴/🟠/🟡/🟢, (3) a Scores table for Merge Safety, Test Coverage, Code Quality, Security Posture with numeric score, letter grade, and delta vs last audit, (4) Top 3 Copilot Action Items, and (5) a copyable “Suggested fix prompt” block.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-23T05:23:10.807Z
Learning: When auditing cloudposse/atmos PRs on request from nitrocode, the expected output format is: (1) 3–5 sentence Executive Summary, (2) Findings Table with columns “Number # | Issue | Severity | Pre-existing? | Suggested Fix” using 🔴/🟠/🟡/🟢, (3) Scores Table for Merge Safety, Test Coverage, Code Quality, Security Posture with numeric score, letter grade, and delta vs last audit, (4) Top 3 Copilot Action Items, and (5) a copyable “Suggested fix prompt” block.

Learnt from: CR
Repo: cloudposse/atmos PR: 0
File: .cursor/rules/atmos-rules.mdc:0-0
Timestamp: 2025-11-24T17:35:37.209Z
Learning: Ensure all tests pass, verify code coverage meets targets, run golangci-lint and fix any issues, and update documentation before submitting pull requests

Learnt from: CR
Repo: cloudposse/atmos PR: 0
File: .cursor/rules/atmos-rules.mdc:0-0
Timestamp: 2025-11-24T17:35:37.209Z
Learning: Applies to **/*_test.go : Every new feature must include comprehensive unit tests targeting >80% code coverage for all packages

Learnt from: Listener430
Repo: cloudposse/atmos PR: 1149
File: tests/snapshots/TestCLICommands_atmos_vendor_pull_ssh.stderr.golden:7-7
Timestamp: 2025-03-18T12:26:25.329Z
Learning: In the Atmos project, typos or inconsistencies in test snapshot files (such as "terrafrom" instead of "terraform") may be intentional as they capture the exact output of commands and should not be flagged as issues requiring correction.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-22T20:29:20.586Z
Learning: Atmos test pattern: prefer os.Executable()+TestMain with an env gate (_ATMOS_TEST_EXIT_ONE) over platform-specific binaries; always add a prerequisite test to prove ExecuteShellCommand propagates env to subprocesses before relying on it in recovery-path tests.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-22T18:02:49.032Z
Learning: In cloudposse/atmos lint stacks, current behavior when --stack has no matching raw manifest is to fall back to repo-scope; this can produce orphan noise. Prefer failing closed or requiring explicit opt-in to repo-scope for L-07 when no seed is found.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 1185
File: internal/exec/yaml_func_store.go:26-26
Timestamp: 2025-04-04T02:03:23.676Z
Learning: The Atmos codebase currently uses `log.Fatal` for error handling in multiple places. The maintainers are aware this isn't an ideal pattern (should only be used in main() or init() functions) and plan to address it comprehensively in a separate PR. CodeRabbit should not flag these issues or push for immediate changes until that refactoring is complete.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-22T15:27:52.572Z
Learning: Atmos test pattern: Use os.Executable()+TestMain with an env flag (_ATMOS_TEST_EXIT_ONE) to create a cross‑platform “exit 1” subprocess in tests; avoid Unix-only binaries (false/true/sh). Also add a prerequisite test to prove env propagation to ExecuteShellCommand before relying on it in recovery tests.

Learnt from: osterman
Repo: cloudposse/atmos PR: 808
File: pkg/config/config.go:478-483
Timestamp: 2024-12-02T21:26:32.337Z
Learning: In the 'atmos' project, when reviewing Go code like `pkg/config/config.go`, avoid suggesting file size checks after downloading remote configs if such checks aren't implemented elsewhere in the codebase.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-23T05:38:47.048Z
Learning: In cloudposse/atmos PR `#2175`, SeedArchivedRepoCache, ResetArchivedRepoCache, newGitHubClientHook, and newGitHubClientHookMu are intentionally kept in the production package pkg/github/repo.go (not export_test.go) because they manipulate unexported package-level state that must be accessible from tests in multiple packages (e.g., internal/exec). Go's export_test.go mechanism is limited to the package under test and is invisible when another package imports the production package during its own tests. These are documented with "NOTE: test utility — cannot move to export_test.go" comments. Do not flag these as production API pollution in future reviews for this repository.

Learnt from: Benbentwo
Repo: cloudposse/atmos PR: 1475
File: internal/exec/terraform.go:269-272
Timestamp: 2025-10-03T18:02:08.535Z
Learning: In internal/exec/terraform.go, when auth.TerraformPreHook fails, the error is logged but execution continues. This is a deliberate design choice to allow Terraform commands to proceed even if authentication setup fails, rather than failing fast.

Learnt from: osterman
Repo: cloudposse/atmos PR: 1813
File: cmd/terraform/shell.go:28-73
Timestamp: 2025-12-13T03:21:35.786Z
Learning: In Atmos, when calling cfg.InitCliConfig, you must first populate the schema.ConfigAndStacksInfo struct with global flag values using flags.ParseGlobalFlags(cmd, v) rather than passing an empty struct. The LoadConfig function (pkg/config/load.go) reads config selection fields (AtmosConfigFilesFromArg, AtmosConfigDirsFromArg, BasePath, ProfilesFromArg) directly from the ConfigAndStacksInfo struct, NOT from Viper. Passing an empty struct causes config selection flags (--base-path, --config, --config-path, --profile) to be silently ignored. Correct pattern: parse flags → populate struct → call InitCliConfig. See cmd/terraform/plan_diff.go for reference implementation.

Learnt from: osterman
Repo: cloudposse/atmos PR: 1697
File: internal/exec/oci_utils.go:0-0
Timestamp: 2025-11-08T19:56:18.660Z
Learning: In the Atmos codebase, when a function receives an `*schema.AtmosConfiguration` parameter, it should read configuration values from `atmosConfig.Settings` fields rather than using direct `os.Getenv()` or `viper.GetString()` calls. The Atmos pattern is: viper.BindEnv in cmd/root.go binds environment variables → Viper unmarshals into atmosConfig.Settings via mapstructure → business logic reads from the Settings struct. This provides centralized config management, respects precedence, and enables testability. Example: `atmosConfig.Settings.AtmosGithubToken` instead of `os.Getenv("ATMOS_GITHUB_TOKEN")` in functions like `getGHCRAuth` in internal/exec/oci_utils.go.

Learnt from: osterman
Repo: cloudposse/atmos PR: 1599
File: internal/exec/terraform.go:394-402
Timestamp: 2025-10-10T23:51:36.597Z
Learning: In Atmos (internal/exec/terraform.go), when adding OpenTofu-specific flags like `--var-file` for `init`, do not gate them based on command name (e.g., checking if `info.Command == "tofu"` or `info.Command == "opentofu"`) because command names don't reliably indicate the actual binary being executed (symlinks, aliases). Instead, document the OpenTofu requirement in code comments and documentation, trusting users who enable the feature (e.g., `PassVars`) to ensure their terraform command points to an OpenTofu binary.

Learnt from: osterman
Repo: cloudposse/atmos PR: 1686
File: docs/prd/tool-dependencies-integration.md:58-64
Timestamp: 2025-12-13T06:07:37.766Z
Learning: cloudposse/atmos: For PRD docs (docs/prd/*.md), markdownlint issues like MD040/MD010/MD034 can be handled in a separate documentation cleanup commit and should not block the current PR.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 944
File: go.mod:206-206
Timestamp: 2025-01-17T00:18:57.769Z
Learning: For indirect dependencies with license compliance issues in the cloudposse/atmos repository, the team prefers to handle them in follow-up PRs rather than blocking the current changes, as these issues often require deeper investigation of the dependency tree.

Learnt from: Listener430
Repo: cloudposse/atmos PR: 934
File: tests/fixtures/scenarios/docs-generate/README.md.gotmpl:99-118
Timestamp: 2025-01-25T03:51:57.689Z
Learning: For the cloudposse/atmos repository, changes to template contents should be handled in dedicated PRs and are typically considered out of scope for PRs focused on other objectives.

Learnt from: osterman
Repo: cloudposse/atmos PR: 887
File: internal/exec/workflow_utils.go:167-169
Timestamp: 2024-12-25T20:28:19.618Z
Learning: The user plans to revert the change from `path.Join` to `filepath.Join` in this PR due to testing gaps and will open a new PR to safely handle the migration without breaking `main`.

coderabbitai · 2026-03-23T13:33:43Z

Tip

For best results, initiate chat on the files or code changes.

@nitrocode

Audit round number: 13

Executive Summary
This PR delivers a faster reflection‑free deep merge and sturdier Terraform workspace setup; previous critical regressions have been addressed. The largest remaining risks are durability: there is no test that guarantees ExecuteTerraform invokes the pipeline only once and the recovery path’s exit‑code wrapping contract is not pinned by a unit test. A few narrow branches remain under‑tested, and the remaining non‑hot‑path mergo usage lacks a linked follow‑up for full removal. Security posture is unchanged; minor polish is needed for networked test preconditions and documentation clarity. Recommendation: land the top three action items below (two guardrail tests + link the mergo follow‑up) before merging.
Findings Table
| # | Issue | Severity | Pre-existing? | Suggested Fix |
|---|---|---|---|---|
| 1 | No explicit test proving ExecuteTerraform invokes the pipeline exactly once (future double‑exec guardrail) | 🔴 Critical | No | Add a spy/counter around ExecuteShellCommand in a test build-tag or DI seam and assert exactly one invocation per ExecuteTerraform() call. Impacts Merge Safety. |
| 2 | Exit‑code wrapping contract not pinned (recovery assumes errors.As(err, ExitCodeError)) | 🔴 Critical | No | Add a sub-test invoking os.Executable() with _ATMOS_TEST_EXIT_ONE=1 and assert errors.As(err, errUtils.ExitCodeError). Locks recovery semantics. Impacts Merge Safety. |
| 3 | Remaining runtime mergo usage in non‑hot paths keeps dependency/CVE surface | 🟠 High | Yes | Open and link a follow‑up issue; port pkg/merge/merge_yaml_functions.go and pkg/devcontainer/config_loader.go to the native override merge; drop mergo from go.mod. Impacts Security Posture. |
| 4 | validate_stacks_test fixture walk can silently degrade if path drift yields zero YAMLs | 🟠 High | No | After WalkDir, require.Positive(t, fixtureFileCount, "fixtures missing") and derive fixture path via runtime.Caller(0) (CWD‑independent). Impacts Test Coverage. |
| 5 | Patch coverage shortfall on small but important branches (merge.go, merge_native.go, terraform_execute_helpers_exec.go) | 🟠 High | No | Add micro-tests: (a) MergeWithOptions_EmptyInputs and flag‑wiring; (b) DeepMergeNative typed‑slice replace path; (c) ExecuteMainTerraformCommand error path. Impacts Test Coverage. |
| 6 | Negative recovery path: test does not assert that Warn is NOT emitted when .terraform/environment mismatches | 🟡 Medium | No | Capture/hook logger in TestRunWorkspaceSetup_NoRecoveryOnMismatchedEnv; assert no Warn; still expect non‑nil error. Impacts Code Quality (behavioral clarity). |
| 7 | “Cross‑validate vs mergo” tests are opt‑in and not advertised (parity drift risk) | 🟡 Medium | Yes | Add a header note and CLAUDE.md snippet with exact command: go test -tags compare_mergo ./pkg/merge -run CrossValidate -v; cite mergo version. Impacts Code Quality. |
| 8 | Production‑scale benchmark may be unrepresentative of large stacks | 🟡 Medium | Yes | Ensure BenchmarkMerge_ProductionScale uses ≥10 inputs, ≥20 top‑level keys, nested list‑of‑map‑of‑list; publish those numbers in the blog alongside micro‑benchmarks. Impacts Merge Safety perception. |
| 9 | Networked GHCR precondition probe can flake and may expose headers in logs | 🟡 Medium | No | Use a short‑timeout http.Client, redact Authorization in any logs, and skip on dial/timeouts; add table‑driven tests (401/403/unreachable) via httptest.Server. Impacts Security Posture/Test Coverage. |
| 10 | Error surface changed (“mergo merge failed:” → “ErrMerge: ”) may break downstream string‑matching | 🟢 Low | No | Document in docs/fixes; prefer errors.Is/As in tests; add a release‑note callout. Impacts Code Quality. |
| 11 | TF_DATA_DIR absolute‑path semantics under‑documented | 🟢 Low | Yes | Amend isTerraformCurrentWorkspace comment: “Relative TF_DATA_DIR is resolved from cwd like Terraform; absolute paths are used directly.” Impacts Code Quality. |
| 12 | Tests reference schema.Provider.Kind; schema drift could silently break intent | 🟢 Low | Partial | Add compile‑guard adjacent to usage: var _ = schema.Provider{Kind: "azure"} to fail fast on rename. Impacts Test Coverage. |
| 13 | “File being processed:” used only as a counter; self‑duplication not explicitly bounded | 🟢 Low | No | Add a self‑bound: if strings.Count(errStr, "File being processed:") > fileCount+1 then t.Errorf(...). Impacts Test Coverage. |
Scores Table
| Score | Number | Grade | Delta vs previous |
|---|---|---|---|
| Merge Safety | 96/100 | A | 0 |
| Test Coverage | 78/100 | B- | +1 |
| Code Quality | 93/100 | A- | 0 |
| Security Posture | 92/100 | A- | 0 |
Top 3 Copilot Action Items

Add two guardrail tests: (1) ExecuteTerraform_single_invocation (spy/counter; assert exactly one ExecuteShellCommand call), (2) ExecuteShellCommand_exit_wrapping_contract (assert errors.As(err, errUtils.ExitCodeError) with _ATMOS_TEST_EXIT_ONE=1).
Close patch‑coverage gaps with three micro-tests: MergeWithOptions_EmptyInputs_and_FlagWiring; DeepMergeNative_TypedSrcSliceReplace_AnyDst; ExecuteMainTerraformCommand_Error_Propagates.
Open and link the mergo‑migration follow‑up issue; update the blog and roadmap entries with the concrete GitHub issue number.

Suggested fix prompt

You are Copilot working on cloudposse/atmos PR `#2201`. Apply fixes in this order:

1) Guardrail tests
- Add internal/exec/terraform_execute_single_invocation_test.go:
  * Inject a test-only spy/counter around ExecuteShellCommand (small DI seam or build tag).
  * Call ExecuteTerraform() once; assert the counter == 1.
- Add internal/exec/terraform_execute_exit_wrapping_test.go:
  * Invoke ExecuteShellCommand(os.Executable(), env ["_ATMOS_TEST_EXIT_ONE=1"]).
  * Assert errors.As(err, errUtils.ExitCodeError).

2) Patch-coverage micro-tests
- pkg/merge/merge_test.go:
  * TestMergeWithOptions_EmptyInputs_ReturnsEmptyMap (zero-input fast-path).
  * TestMergeWithOptions_StrategyFlags_WireThrough (append vs sliceDeepCopy; assert effective behavior).
- pkg/merge/merge_native_test.go:
  * TestDeepMergeNative_TypedSrcSliceReplace_AnyDst (typed []string replaces existing []any in replace mode).
- internal/exec/terraform_execute_helpers_exec_test.go:
  * TestExecuteMainTerraformCommand_Error_Propagates (exercise untested return).

3) validate_stacks_test hardening
- Build absolute fixture path via runtime.Caller(0).
- require.Positive(t, fixtureFileCount, "stacks fixtures missing — check testCasesPath").
- Add self-bound: if strings.Count(errStr, "File being processed:") > fileCount+1 then t.Errorf(...).

4) Workspace recovery negative-path logging
- In TestRunWorkspaceSetup_NoRecoveryOnMismatchedEnv, hook/capture logger; assert no Warn; still expect non-nil error.

5) Advertise mergo parity checks
- In pkg/merge/merge_compare_mergo_test.go header and CLAUDE.md, add:
  go test -tags compare_mergo ./pkg/merge -run CrossValidate -v
  (record mergo version validated).

6) Production-scale benchmarks + blog
- Expand BenchmarkMerge_ProductionScale to ≥10 inputs × ≥20 keys with nested list-of-map-of-list.
- Update website/blog/2026-03-15-faster-deep-merge.mdx with those numbers next to micro-benchmarks.

7) Mergo follow-up linkage
- Open a GH issue “Migrate remaining mergo call-sites (merge_yaml_functions.go, devcontainer/config_loader.go) to native; drop mergo.”
- Paste its # into the blog and website/src/data/roadmap.js.

8) Minor polish
- isTerraformCurrentWorkspace comment: explicitly state TF_DATA_DIR parity with Terraform.
- Add compile-guard near tests referencing schema.Provider.Kind:
  var _ = schema.Provider{Kind: "azure"} // compile-guard.
- Docs/fixes summary: include internal/exec/testmain_test.go and both recovery tests.

Finally:
- Run golangci-lint run ./...
- make test (ensure platform gates honored).
- Re-run CI and confirm patch coverage ~100% on changed files.

🧠 Learnings used

Learnt from: osterman
Repo: cloudposse/atmos PR: 1686
File: errors/errors.go:184-203
Timestamp: 2025-12-13T06:10:13.688Z
Learning: cloudposse/atmos: For toolchain work, duplicate/unused error sentinels in errors/errors.go should be cleaned up in a separate refactor PR and not block feature PRs; canonical toolchain sentinels live under toolchain/registry with re-exports in toolchain/errors.go.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 1261
File: internal/exec/utils.go:639-640
Timestamp: 2025-05-22T15:42:10.906Z
Learning: In the Atmos codebase, when appending slices with `args := append(configAndStacksInfo.CliArgs, configAndStacksInfo.AdditionalArgsAndFlags...)`, it's intentional that the result is not stored back in the original slice. This pattern is used when the merged result serves a different purpose than the original slices, such as when creating a filtered version for component section assignments.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 2215
File: pkg/config/config.go:250-252
Timestamp: 2026-03-18T14:35:09.584Z
Learning: In cloudposse/atmos `pkg/config/config.go`, bare relative paths (e.g., "stacks", "foo/bar") are source-independent per PRD FR4: they always go through git root search regardless of whether the source is config-file, env var (ATMOS_BASE_PATH), or CLI flag (--base-path). Git root takes priority when both <git-root>/<path> and <cwd>/<path> exist. This is intentional — the same bare value should resolve to the same location regardless of where it was specified. The `os.Stat` fallback in `tryResolveWithGitRoot` handles cases where the git-root path doesn't exist. Users who want explicit CWD-relative behavior must use a dot-prefix (e.g., `./stacks` instead of `stacks`). Do not flag this as a bug in code reviews.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 759
File: internal/exec/terraform.go:366-368
Timestamp: 2024-11-02T15:35:09.958Z
Learning: In `internal/exec/terraform.go`, the workspace cleaning code under both the general execution path and within the `case "init":` block is intentionally duplicated because the code execution paths are different. The `.terraform/environment` file should be deleted before executing `terraform init` in both scenarios to ensure a clean state.

Learnt from: osterman
Repo: cloudposse/atmos PR: 727
File: internal/exec/terraform_clean.go:405-412
Timestamp: 2024-10-30T13:25:45.965Z
Learning: In `internal/exec/terraform_clean.go`, when appending `stackFolders` to `folders` in the `handleCleanSubCommand` function, it's unnecessary to check if `stackFolders` is nil before appending, because in Go, appending a nil slice is safe and does not cause a panic.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-22T04:19:23.596Z
Learning: cloudposse/atmos: pkg/filesystem exposes test-only ResetGlobMatchesCache() and ResetPathMatchCache() helpers; use these (with t.Cleanup) instead of direct global var assignment to avoid data races and inter-test coupling.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-22T13:00:09.560Z
Learning: cloudposse/atmos: GetGlobMatches has intentionally divergent semantics across packages — pkg/filesystem.GetGlobMatches always returns a non-nil slice (empty on no matches), while pkg/utils.GetGlobMatches treats no matches as an error or nil depending on doublestar.Glob result. Tests and docs should call this out to avoid confusion in reviews.

Learnt from: osterman
Repo: cloudposse/atmos PR: 740
File: cmd/cmd_utils.go:340-359
Timestamp: 2024-10-23T21:36:40.262Z
Learning: In the Go codebase for Atmos, when reviewing functions like `checkAtmosConfig` in `cmd/cmd_utils.go`, avoid suggesting refactoring to return errors instead of calling `os.Exit` if such changes would significantly increase the scope due to the need to update multiple call sites.

Learnt from: Cerebrovinny
Repo: cloudposse/atmos PR: 1560
File: pkg/utils/string_utils.go:43-64
Timestamp: 2025-09-30T19:03:50.738Z
Learning: In the Atmos codebase, YAML tags like !terraform.output rely on positional arguments, so the SplitStringByDelimiter function in pkg/utils/string_utils.go must preserve empty strings (even after trimming quotes) to maintain the correct number of positional arguments. Filtering out empty values after trimming would collapse the array and break these function calls.

Learnt from: haitham911
Repo: cloudposse/atmos PR: 727
File: internal/exec/terraform_clean.go:407-416
Timestamp: 2024-11-24T19:13:10.287Z
Learning: In `internal/exec/terraform_clean.go`, when `getStackTerraformStateFolder` returns an error in the `handleCleanSubCommand` function, the error is logged, and the process continues without returning the error.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 775
File: internal/exec/template_funcs_component.go:157-159
Timestamp: 2024-11-12T03:16:02.910Z
Learning: In the Go code for `componentFunc` in `internal/exec/template_funcs_component.go`, the function `cleanTerraformWorkspace` does not return errors, and it's acceptable if the file does not exist. Therefore, error handling for `cleanTerraformWorkspace` is not needed.

Learnt from: osterman
Repo: cloudposse/atmos PR: 795
File: internal/exec/stack_processor_utils.go:378-386
Timestamp: 2024-11-19T23:00:45.899Z
Learning: In the `ProcessYAMLConfigFile` function within `internal/exec/stack_processor_utils.go`, directory traversal in stack imports is acceptable and should not be restricted.

Learnt from: CR
Repo: cloudposse/atmos PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T22:00:12.699Z
Learning: Applies to **/*_test.go : Test behavior, not implementation; never test stub functions (implement or remove); avoid tautological tests; make code testable via DI; no coverage theater; remove always-skipped tests; use `errors.Is()` for error checking

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-19T04:02:15.601Z
Learning: In cloudposse/atmos, pkg/config/homedir now exposes unexported DI hooks for testing (currentUserFunc, darwinHomeDirFunc, shellHomeDirCmd). Tests that mutate these must not use t.Parallel() and should always restore originals via defer to avoid cross-test interference.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 1352
File: pkg/store/artifactory_store_test.go:108-113
Timestamp: 2025-08-15T14:43:41.030Z
Learning: In test files for the atmos project, it's acceptable to ignore errors from os.Setenv/Unsetenv operations during test environment setup and teardown, as these are controlled test scenarios.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-23T04:09:45.144Z
Learning: When adding a recovery path that depends on ExecuteShellCommand exit-code semantics, always include a unit test that asserts errors.As(err, errUtils.ExitCodeError) so future refactors cannot break the wrapping contract unnoticed.

Learnt from: osterman
Repo: cloudposse/atmos PR: 727
File: internal/exec/terraform_clean.go:329-332
Timestamp: 2024-10-28T01:51:30.811Z
Learning: In the Atmos Go code, when deleting directories or handling file paths (e.g., in `terraform_clean.go`), always resolve the absolute path using `filepath.Abs` and use the logger `u.LogWarning` for logging messages instead of using `fmt.Printf`.

Learnt from: haitham911
Repo: cloudposse/atmos PR: 727
File: internal/exec/terraform_clean.go:314-319
Timestamp: 2024-10-27T04:54:32.397Z
Learning: When deleting empty folders in the `deleteFolders` function, handling errors from `os.Remove` are not required, as failures do not affect the process.

Learnt from: haitham911
Repo: cloudposse/atmos PR: 727
File: internal/exec/terraform_clean.go:215-223
Timestamp: 2024-10-27T04:41:49.199Z
Learning: In `internal/exec/terraform_clean.go`, the function `determineCleanPath` is necessary and should not be removed.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-18T23:56:50.624Z
Learning: In cloudposse/atmos, when auth.TerraformPreHook fails in internal/exec/terraform_execute_helpers_exec.go (prepareComponentExecution function), execution is TERMINATED by returning the error. This is the correct behavior — the code comment reads "Pre-hook failures terminate execution — this matches the original terraform.go behavior. Authentication setup failures must not silently produce unauthenticated terraform commands." The previously stored learning (ID 11a3bc42) that said "execution continues" was stale and incorrect. The original terraform.go always returned on TerraformPreHook error.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-23T04:53:54.105Z
Learning: When nitrocode requests an audit on cloudposse/atmos PRs, respond with: (1) a 3–5 sentence Executive Summary, (2) a Findings table with columns “Number # | Issue | Severity | Pre-existing? | Suggested Fix” using 🔴/🟠/🟡/🟢, (3) a Scores table for Merge Safety, Test Coverage, Code Quality, Security Posture with numeric score, letter grade, and delta vs last audit, (4) Top 3 Copilot Action Items, and (5) a copyable “Suggested fix prompt” block.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-23T05:23:10.807Z
Learning: When auditing cloudposse/atmos PRs on request from nitrocode, the expected output format is: (1) 3–5 sentence Executive Summary, (2) Findings Table with columns “Number # | Issue | Severity | Pre-existing? | Suggested Fix” using 🔴/🟠/🟡/🟢, (3) Scores Table for Merge Safety, Test Coverage, Code Quality, Security Posture with numeric score, letter grade, and delta vs last audit, (4) Top 3 Copilot Action Items, and (5) a copyable “Suggested fix prompt” block.

Learnt from: CR
Repo: cloudposse/atmos PR: 0
File: .cursor/rules/atmos-rules.mdc:0-0
Timestamp: 2025-11-24T17:35:37.209Z
Learning: Ensure all tests pass, verify code coverage meets targets, run golangci-lint and fix any issues, and update documentation before submitting pull requests

Learnt from: CR
Repo: cloudposse/atmos PR: 0
File: .cursor/rules/atmos-rules.mdc:0-0
Timestamp: 2025-11-24T17:35:37.209Z
Learning: Applies to **/*_test.go : Every new feature must include comprehensive unit tests targeting >80% code coverage for all packages

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-22T20:29:20.586Z
Learning: Atmos test pattern: prefer os.Executable()+TestMain with an env gate (_ATMOS_TEST_EXIT_ONE) over platform-specific binaries; always add a prerequisite test to prove ExecuteShellCommand propagates env to subprocesses before relying on it in recovery-path tests.

Learnt from: Listener430
Repo: cloudposse/atmos PR: 1149
File: tests/snapshots/TestCLICommands_atmos_vendor_pull_ssh.stderr.golden:7-7
Timestamp: 2025-03-18T12:26:25.329Z
Learning: In the Atmos project, typos or inconsistencies in test snapshot files (such as "terrafrom" instead of "terraform") may be intentional as they capture the exact output of commands and should not be flagged as issues requiring correction.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-22T18:02:49.032Z
Learning: In cloudposse/atmos lint stacks, current behavior when --stack has no matching raw manifest is to fall back to repo-scope; this can produce orphan noise. Prefer failing closed or requiring explicit opt-in to repo-scope for L-07 when no seed is found.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-22T15:27:52.572Z
Learning: Atmos test pattern: Use os.Executable()+TestMain with an env flag (_ATMOS_TEST_EXIT_ONE) to create a cross‑platform “exit 1” subprocess in tests; avoid Unix-only binaries (false/true/sh). Also add a prerequisite test to prove env propagation to ExecuteShellCommand before relying on it in recovery tests.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 1185
File: internal/exec/yaml_func_store.go:26-26
Timestamp: 2025-04-04T02:03:23.676Z
Learning: The Atmos codebase currently uses `log.Fatal` for error handling in multiple places. The maintainers are aware this isn't an ideal pattern (should only be used in main() or init() functions) and plan to address it comprehensively in a separate PR. CodeRabbit should not flag these issues or push for immediate changes until that refactoring is complete.

Learnt from: osterman
Repo: cloudposse/atmos PR: 808
File: pkg/config/config.go:478-483
Timestamp: 2024-12-02T21:26:32.337Z
Learning: In the 'atmos' project, when reviewing Go code like `pkg/config/config.go`, avoid suggesting file size checks after downloading remote configs if such checks aren't implemented elsewhere in the codebase.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-23T05:38:47.048Z
Learning: In cloudposse/atmos PR `#2175`, SeedArchivedRepoCache, ResetArchivedRepoCache, newGitHubClientHook, and newGitHubClientHookMu are intentionally kept in the production package pkg/github/repo.go (not export_test.go) because they manipulate unexported package-level state that must be accessible from tests in multiple packages (e.g., internal/exec). Go's export_test.go mechanism is limited to the package under test and is invisible when another package imports the production package during its own tests. These are documented with "NOTE: test utility — cannot move to export_test.go" comments. Do not flag these as production API pollution in future reviews for this repository.

Learnt from: Benbentwo
Repo: cloudposse/atmos PR: 1475
File: internal/exec/terraform.go:269-272
Timestamp: 2025-10-03T18:02:08.535Z
Learning: In internal/exec/terraform.go, when auth.TerraformPreHook fails, the error is logged but execution continues. This is a deliberate design choice to allow Terraform commands to proceed even if authentication setup fails, rather than failing fast.

Learnt from: osterman
Repo: cloudposse/atmos PR: 1813
File: cmd/terraform/shell.go:28-73
Timestamp: 2025-12-13T03:21:35.786Z
Learning: In Atmos, when calling cfg.InitCliConfig, you must first populate the schema.ConfigAndStacksInfo struct with global flag values using flags.ParseGlobalFlags(cmd, v) rather than passing an empty struct. The LoadConfig function (pkg/config/load.go) reads config selection fields (AtmosConfigFilesFromArg, AtmosConfigDirsFromArg, BasePath, ProfilesFromArg) directly from the ConfigAndStacksInfo struct, NOT from Viper. Passing an empty struct causes config selection flags (--base-path, --config, --config-path, --profile) to be silently ignored. Correct pattern: parse flags → populate struct → call InitCliConfig. See cmd/terraform/plan_diff.go for reference implementation.

Learnt from: osterman
Repo: cloudposse/atmos PR: 1697
File: internal/exec/oci_utils.go:0-0
Timestamp: 2025-11-08T19:56:18.660Z
Learning: In the Atmos codebase, when a function receives an `*schema.AtmosConfiguration` parameter, it should read configuration values from `atmosConfig.Settings` fields rather than using direct `os.Getenv()` or `viper.GetString()` calls. The Atmos pattern is: viper.BindEnv in cmd/root.go binds environment variables → Viper unmarshals into atmosConfig.Settings via mapstructure → business logic reads from the Settings struct. This provides centralized config management, respects precedence, and enables testability. Example: `atmosConfig.Settings.AtmosGithubToken` instead of `os.Getenv("ATMOS_GITHUB_TOKEN")` in functions like `getGHCRAuth` in internal/exec/oci_utils.go.

Learnt from: osterman
Repo: cloudposse/atmos PR: 1686
File: docs/prd/tool-dependencies-integration.md:58-64
Timestamp: 2025-12-13T06:07:37.766Z
Learning: cloudposse/atmos: For PRD docs (docs/prd/*.md), markdownlint issues like MD040/MD010/MD034 can be handled in a separate documentation cleanup commit and should not block the current PR.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 944
File: go.mod:206-206
Timestamp: 2025-01-17T00:18:57.769Z
Learning: For indirect dependencies with license compliance issues in the cloudposse/atmos repository, the team prefers to handle them in follow-up PRs rather than blocking the current changes, as these issues often require deeper investigation of the dependency tree.

Learnt from: Listener430
Repo: cloudposse/atmos PR: 934
File: tests/fixtures/scenarios/docs-generate/README.md.gotmpl:99-118
Timestamp: 2025-01-25T03:51:57.689Z
Learning: For the cloudposse/atmos repository, changes to template contents should be handled in dedicated PRs and are typically considered out of scope for PRs focused on other objectives.

Learnt from: osterman
Repo: cloudposse/atmos PR: 887
File: internal/exec/workflow_utils.go:167-169
Timestamp: 2024-12-25T20:28:19.618Z
Learning: The user plans to revert the change from `path.Join` to `filepath.Join` in this PR due to testing gaps and will open a new PR to safely handle the migration without breaking `main`.

Thorough analysis of the core merge pipeline replacement: - 0 critical bugs found - 2 high-risk items (narrow cross-validation, mergo/native split) - 2 medium items (misleading test name, no concurrent test) - Detailed merge semantics table (maps, slices, nil, types) - Known divergence documented (sliceDeepCopy truncation) - Terraform workspace recovery analysis (separate concern in PR) - Prioritized recommendations (must/should/after merge) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ocess invocation Co-authored-by: nitrocode <7775707+nitrocode@users.noreply.github.com> Agent-Logs-Url: https://github.com/cloudposse/atmos/sessions/5bb3b12c-a169-44ec-99ea-215b6fed3401

Critical fix: - mergeSlicesNative now extends result when src is longer than dst, matching mergo's WithSliceDeepCopy behavior. Lint fixes (all pre-existing in PR #2201): - nolint directives for deepMergeNative complexity - nolint for TF_DATA_DIR, ReadFile, client.Do - Fix godot, nolintlint, gocritic issues Tests: - Fix misleading test name, update 3 truncation tests - Add 5 cross-validation tests for slice modes Docs: - Add mergo TODO comments, update fix doc Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rmance' into copilot/improve-deep-merge-performance

aknysh · 2026-03-23T19:04:32Z

@coderabbitai full review please, and check if all your comments and concerns have been correctly addressed

coderabbitai · 2026-03-23T19:10:30Z

🧩 Analysis chain

@aknysh Thanks for the nudge — here’s a concise full-review status based on the live repo.

Executive summary

Deep-merge: native path looks correct (flags threaded; safeCap present; shallow slice copy removed). Workspace recovery: wired into runWorkspaceSetup with ExitCodeError + Warn and TF_DATA_DIR semantics.
Tests: cross‑platform TestMain gate, env propagation prerequisite, and recovery positive/negative paths are present. compare_mergo opt‑in and a production‑scale benchmark exist.
Remaining: I still do not see (1) a guardrail test that guarantees ExecuteTerraform invokes the pipeline exactly once, and (2) an explicit exit‑wrapping contract test that asserts errors.As(err, errUtils.ExitCodeError) when the test binary exits 1. Also, mergo still exists in non‑hot paths; blog/roadmap should link the concrete follow‑up issue number if not already.

What’s addressed

Native merge fixed aliasing and correctly propagates appendSlice/sliceDeepCopy flags. safeCap bounds capacity hints (no OOM risk), and no copy(result,dst) remains.
Workspace recovery: runWorkspaceSetup uses errors.As(ExitCodeError) and checks isTerraformCurrentWorkspace; Warn is logged only on true positive. TF_DATA_DIR paths: relative resolved like Terraform; absolute honored.
Tests: TestMain with _ATMOS_TEST_EXIT_ONE present; TestExecuteShellCommand_PropagatesEnvToSubprocess, TestRunWorkspaceSetup_RecoveryPath and NoRecoveryOnMismatchedEnv are in-tree. compare_mergo tests are behind a build tag; BenchmarkMerge_ProductionScale exists. validate_stacks_test uses absolute pathing and require.Positive; per‑block bounds applied.

Still open (ranked)

Add ExecuteTerraform_single_invocation guardrail

Rationale: prevents the exact double‑execution regression from ever returning.
Minimal fix: add a test‑only spy/counter around ExecuteShellCommand (tiny DI seam or test build‑tag) and assert exactly one invocation per ExecuteTerraform.

Pin the exit‑wrapping contract

Add a contract test that runs os.Executable() with _ATMOS_TEST_EXIT_ONE=1 and asserts errors.As(err, errUtils.ExitCodeError). This makes recovery robust against refactors.

Link the mergo follow‑up issue number in both blog and roadmap (if not already)

If migration issue exists, add # to website/blog/2026‑03‑15‑faster‑deep‑merge.mdx and website/src/data/roadmap.js.

Optional hardening: negative‑path logging assertion

In NoRecoveryOnMismatchedEnv, capture the logger and assert no Warn.

If you’d like, I can open the mergo‑migration follow‑up and draft the two guardrail tests.

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Branch & changed files (gh) ==="
gh pr view 2201 --repo cloudposse/atmos --json headRefName,files | jq -r '.headRefName, (.files[].path)'

echo
echo "=== Residual mergo usage in runtime code (exclude *_test.go) ==="
rg -n --type go '(^|[^_])\bmergo\.' -g '!**/*_test.go' || true
rg -n --type go 'dario\.cat/mergo' -g '!**/*_test.go' || true

echo
echo "=== merge_native.go: flags threading, capacity guard, no shallow copy ==="
rg -n 'func deepMergeNative\(|func mergeSlicesNative\(|safeCap\(|copy\(result,\s*dst\)' pkg/merge/merge_native.go || true
rg -n 'deepMergeNative\(.+appendSlice,\s*sliceDeepCopy\)' pkg/merge/merge_native.go || true

echo
echo "=== terraform utils & exec helpers: workspace recovery wiring ==="
rg -n 'func isTerraformCurrentWorkspace|TF_DATA_DIR|errors\.Is\(err,\s*os\.ErrNotExist\)' internal/exec/terraform_utils.go || true
rg -n 'func runWorkspaceSetup|errors\.As\(err,.*ExitCodeError\)|log\.(Warn|Debug)|isTerraformCurrentWorkspace' internal/exec/terraform_execute_helpers_exec.go || true

echo
echo "=== Tests: TestMain env gate, recovery positive/negative, env propagation ==="
rg -n 'TestMain\(|_ATMOS_TEST_EXIT_ONE|TestExecuteShellCommand_PropagatesEnvToSubprocess|TestRunWorkspaceSetup_RecoveryPath|TestRunWorkspaceSetup_NoRecoveryOnMismatchedEnv' internal/exec || true

echo
echo "=== Guardrail tests: single-invocation & exit-wrapping contract ==="
rg -n 'single[_ -]?invocation|ExecuteTerraform_.*single|_ATMOS_TEST_COUNTER_FILE|exit[_ -]?wrapping|ExitCodeError.*contract' internal/exec || true

echo
echo "=== validate_stacks_test: fixture guard & bounds ==="
rg -n 'runtime\.Caller|require\.Positive|fileCount|maxOccurrences|File being processed:|Import chain:|\*\*Likely cause:\*\*|\*\*Debug hint:\*\*' internal/exec/validate_stacks_test.go || true

echo
echo "=== compare_mergo opt-in tests & production-scale benchmark ==="
rg -n '//go:build compare_mergo|TestCompareMergo' pkg/merge || true
rg -n 'BenchmarkMerge_ProductionScale' pkg/merge || true

echo
echo "=== Blog & roadmap: follow-up issue link for mergo removal ==="
rg -n 'follow[- ]up|#[0-9]{3,6}|mergo' website/blog/2026-03-15-faster-deep-merge.mdx website/src/data/roadmap.js || true

echo
echo "=== Docs/fixes: safeCap vs safeAdd wording ==="
rg -n 'safeCap|safeAdd|math\.MaxInt|1<<24' docs/fixes/2026-03-19-deep-merge-native-fixes.md || true

echo
echo "=== GHCR preconditions: timeout/redaction and tests presence ==="
rg -n 'RequireOCIAuthentication|ghcr\.io|http\.Client|Timeout|Authorization' tests/preconditions.go || true

Length of output: 17348

@aknysh Full review completed. Here’s the status based on your script output.

Addressed (all previous critical/high)

Deep-merge native path
- safeCap present; no shallow copy of dst; slice and map paths deep‑copy correctly.
- Flags (appendSlice, sliceDeepCopy) are threaded through deepMergeNative and mergeSlicesNative.
Workspace recovery and utils
- runWorkspaceSetup uses errors.As(err, ExitCodeError) and gates Warn on isTerraformCurrentWorkspace(componentPath, workspace).
- isTerraformCurrentWorkspace handles default via errors.Is(err, os.ErrNotExist) and documents TF_DATA_DIR parity; absolute TF_DATA_DIR supported.
Guardrail tests added
- Single‑invocation guard: internal/exec/terraform_execute_single_invocation_test.go with _ATMOS_TEST_COUNTER_FILE.
- Exit‑wrapping contract: internal/exec/terraform_execute_exit_wrapping_test.go asserts errors.As(err, errUtils.ExitCodeError).
- Env propagation prerequisite and recovery positive/negative tests are present.
validate_stacks_test hardening
- Source‑relative fixture path via runtime.Caller; require.Positive on counts; self‑bound for “File being processed:” vs fixture YAMLs; per‑block token bounds.
Parity and perf
- compare_mergo opt‑in tests with header notes; BenchmarkMerge_ProductionScale present.
Docs/roadmap
- Docs mention safeCap (1<<24). Blog and roadmap link follow‑up issue #2242 for remaining mergo call‑sites.

Open (non‑blocking)

Remaining mergo usage exists in non‑hot paths (pkg/merge/merge_yaml_functions.go, pkg/devcontainer/config_loader.go), but it’s explicitly tracked in #2242.
Optional polish: in the negative recovery test, consider asserting that no Warn was emitted (not required to merge).

Conclusion

All prior CodeRabbit concerns are addressed. I recommend proceeding once CI is green; no blockers from my side.

✏️ Learnings added

Learnt from: aknysh
URL: https://github.com/cloudposse/atmos/pull/2201

Timestamp: 2026-03-23T19:10:29.265Z
Learning: cloudposse/atmos: When adding a recovery path that depends on ExecuteShellCommand exit-code semantics, always include a unit test that asserts errors.As(err, errUtils.ExitCodeError) so future refactors cannot break the wrapping contract.

Learnt from: aknysh
URL: https://github.com/cloudposse/atmos/pull/2201

Timestamp: 2026-03-23T19:10:29.265Z
Learning: cloudposse/atmos: Add a guardrail test that asserts ExecuteTerraform invokes ExecuteShellCommand exactly once per call to prevent double-execution regressions.

Learnt from: aknysh
URL: https://github.com/cloudposse/atmos/pull/2201

Timestamp: 2026-03-23T19:10:29.265Z
Learning: Atmos tests: use _ATMOS_TEST_COUNTER_FILE in TestMain to count subprocess invocations and enforce single‑invocation guardrails in ExecuteTerraform tests.

Learnt from: aknysh
URL: https://github.com/cloudposse/atmos/pull/2201

Timestamp: 2026-03-23T19:10:29.265Z
Learning: Atmos: Remaining mergo usage outside hot path is acceptable short‑term when a concrete follow‑up exists; link the issue (e.g., `#2242`) in blog/roadmap and do not block the PR.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: osterman
Repo: cloudposse/atmos PR: 1686
File: errors/errors.go:184-203
Timestamp: 2025-12-13T06:10:13.688Z
Learning: cloudposse/atmos: For toolchain work, duplicate/unused error sentinels in errors/errors.go should be cleaned up in a separate refactor PR and not block feature PRs; canonical toolchain sentinels live under toolchain/registry with re-exports in toolchain/errors.go.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 1261
File: internal/exec/utils.go:639-640
Timestamp: 2025-05-22T15:42:10.906Z
Learning: In the Atmos codebase, when appending slices with `args := append(configAndStacksInfo.CliArgs, configAndStacksInfo.AdditionalArgsAndFlags...)`, it's intentional that the result is not stored back in the original slice. This pattern is used when the merged result serves a different purpose than the original slices, such as when creating a filtered version for component section assignments.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 2215
File: pkg/config/config.go:250-252
Timestamp: 2026-03-18T14:35:09.584Z
Learning: In cloudposse/atmos `pkg/config/config.go`, bare relative paths (e.g., "stacks", "foo/bar") are source-independent per PRD FR4: they always go through git root search regardless of whether the source is config-file, env var (ATMOS_BASE_PATH), or CLI flag (--base-path). Git root takes priority when both <git-root>/<path> and <cwd>/<path> exist. This is intentional — the same bare value should resolve to the same location regardless of where it was specified. The `os.Stat` fallback in `tryResolveWithGitRoot` handles cases where the git-root path doesn't exist. Users who want explicit CWD-relative behavior must use a dot-prefix (e.g., `./stacks` instead of `stacks`). Do not flag this as a bug in code reviews.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 759
File: internal/exec/terraform.go:366-368
Timestamp: 2024-11-02T15:35:09.958Z
Learning: In `internal/exec/terraform.go`, the workspace cleaning code under both the general execution path and within the `case "init":` block is intentionally duplicated because the code execution paths are different. The `.terraform/environment` file should be deleted before executing `terraform init` in both scenarios to ensure a clean state.

Learnt from: osterman
Repo: cloudposse/atmos PR: 727
File: internal/exec/terraform_clean.go:405-412
Timestamp: 2024-10-30T13:25:45.965Z
Learning: In `internal/exec/terraform_clean.go`, when appending `stackFolders` to `folders` in the `handleCleanSubCommand` function, it's unnecessary to check if `stackFolders` is nil before appending, because in Go, appending a nil slice is safe and does not cause a panic.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-22T04:19:23.596Z
Learning: cloudposse/atmos: pkg/filesystem exposes test-only ResetGlobMatchesCache() and ResetPathMatchCache() helpers; use these (with t.Cleanup) instead of direct global var assignment to avoid data races and inter-test coupling.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-22T13:00:09.560Z
Learning: cloudposse/atmos: GetGlobMatches has intentionally divergent semantics across packages — pkg/filesystem.GetGlobMatches always returns a non-nil slice (empty on no matches), while pkg/utils.GetGlobMatches treats no matches as an error or nil depending on doublestar.Glob result. Tests and docs should call this out to avoid confusion in reviews.

Learnt from: osterman
Repo: cloudposse/atmos PR: 740
File: cmd/cmd_utils.go:340-359
Timestamp: 2024-10-23T21:36:40.262Z
Learning: In the Go codebase for Atmos, when reviewing functions like `checkAtmosConfig` in `cmd/cmd_utils.go`, avoid suggesting refactoring to return errors instead of calling `os.Exit` if such changes would significantly increase the scope due to the need to update multiple call sites.

Learnt from: haitham911
Repo: cloudposse/atmos PR: 727
File: internal/exec/terraform_clean.go:407-416
Timestamp: 2024-11-24T19:13:10.287Z
Learning: In `internal/exec/terraform_clean.go`, when `getStackTerraformStateFolder` returns an error in the `handleCleanSubCommand` function, the error is logged, and the process continues without returning the error.

Learnt from: Cerebrovinny
Repo: cloudposse/atmos PR: 1560
File: pkg/utils/string_utils.go:43-64
Timestamp: 2025-09-30T19:03:50.738Z
Learning: In the Atmos codebase, YAML tags like !terraform.output rely on positional arguments, so the SplitStringByDelimiter function in pkg/utils/string_utils.go must preserve empty strings (even after trimming quotes) to maintain the correct number of positional arguments. Filtering out empty values after trimming would collapse the array and break these function calls.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 775
File: internal/exec/template_funcs_component.go:157-159
Timestamp: 2024-11-12T03:16:02.910Z
Learning: In the Go code for `componentFunc` in `internal/exec/template_funcs_component.go`, the function `cleanTerraformWorkspace` does not return errors, and it's acceptable if the file does not exist. Therefore, error handling for `cleanTerraformWorkspace` is not needed.

Learnt from: osterman
Repo: cloudposse/atmos PR: 795
File: internal/exec/stack_processor_utils.go:378-386
Timestamp: 2024-11-19T23:00:45.899Z
Learning: In the `ProcessYAMLConfigFile` function within `internal/exec/stack_processor_utils.go`, directory traversal in stack imports is acceptable and should not be restricted.

Learnt from: CR
Repo: cloudposse/atmos PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T22:00:12.699Z
Learning: Applies to **/*_test.go : Test behavior, not implementation; never test stub functions (implement or remove); avoid tautological tests; make code testable via DI; no coverage theater; remove always-skipped tests; use `errors.Is()` for error checking

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-23T13:33:02.311Z
Learning: When a PR depends on ExecuteShellCommand returning exit codes in a specific wrapper type (errUtils.ExitCodeError), always add a unit test that asserts errors.As(err, errUtils.ExitCodeError) so future refactors can’t silently break recovery paths.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-19T04:02:15.601Z
Learning: In cloudposse/atmos, pkg/config/homedir now exposes unexported DI hooks for testing (currentUserFunc, darwinHomeDirFunc, shellHomeDirCmd). Tests that mutate these must not use t.Parallel() and should always restore originals via defer to avoid cross-test interference.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 1352
File: pkg/store/artifactory_store_test.go:108-113
Timestamp: 2025-08-15T14:43:41.030Z
Learning: In test files for the atmos project, it's acceptable to ignore errors from os.Setenv/Unsetenv operations during test environment setup and teardown, as these are controlled test scenarios.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-23T04:09:45.144Z
Learning: When adding a recovery path that depends on ExecuteShellCommand exit-code semantics, always include a unit test that asserts errors.As(err, errUtils.ExitCodeError) so future refactors cannot break the wrapping contract unnoticed.

Learnt from: osterman
Repo: cloudposse/atmos PR: 727
File: internal/exec/terraform_clean.go:329-332
Timestamp: 2024-10-28T01:51:30.811Z
Learning: In the Atmos Go code, when deleting directories or handling file paths (e.g., in `terraform_clean.go`), always resolve the absolute path using `filepath.Abs` and use the logger `u.LogWarning` for logging messages instead of using `fmt.Printf`.

Learnt from: haitham911
Repo: cloudposse/atmos PR: 727
File: internal/exec/terraform_clean.go:314-319
Timestamp: 2024-10-27T04:54:32.397Z
Learning: When deleting empty folders in the `deleteFolders` function, handling errors from `os.Remove` are not required, as failures do not affect the process.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-18T23:56:50.624Z
Learning: In cloudposse/atmos, when auth.TerraformPreHook fails in internal/exec/terraform_execute_helpers_exec.go (prepareComponentExecution function), execution is TERMINATED by returning the error. This is the correct behavior — the code comment reads "Pre-hook failures terminate execution — this matches the original terraform.go behavior. Authentication setup failures must not silently produce unauthenticated terraform commands." The previously stored learning (ID 11a3bc42) that said "execution continues" was stale and incorrect. The original terraform.go always returned on TerraformPreHook error.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-23T04:53:54.105Z
Learning: When nitrocode requests an audit on cloudposse/atmos PRs, respond with: (1) a 3–5 sentence Executive Summary, (2) a Findings table with columns “Number # | Issue | Severity | Pre-existing? | Suggested Fix” using 🔴/🟠/🟡/🟢, (3) a Scores table for Merge Safety, Test Coverage, Code Quality, Security Posture with numeric score, letter grade, and delta vs last audit, (4) Top 3 Copilot Action Items, and (5) a copyable “Suggested fix prompt” block.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-23T05:23:10.807Z
Learning: When auditing cloudposse/atmos PRs on request from nitrocode, the expected output format is: (1) 3–5 sentence Executive Summary, (2) Findings Table with columns “Number # | Issue | Severity | Pre-existing? | Suggested Fix” using 🔴/🟠/🟡/🟢, (3) Scores Table for Merge Safety, Test Coverage, Code Quality, Security Posture with numeric score, letter grade, and delta vs last audit, (4) Top 3 Copilot Action Items, and (5) a copyable “Suggested fix prompt” block.

Learnt from: CR
Repo: cloudposse/atmos PR: 0
File: .cursor/rules/atmos-rules.mdc:0-0
Timestamp: 2025-11-24T17:35:37.209Z
Learning: Ensure all tests pass, verify code coverage meets targets, run golangci-lint and fix any issues, and update documentation before submitting pull requests

Learnt from: CR
Repo: cloudposse/atmos PR: 0
File: .cursor/rules/atmos-rules.mdc:0-0
Timestamp: 2025-11-24T17:35:37.209Z
Learning: Applies to **/*_test.go : Every new feature must include comprehensive unit tests targeting >80% code coverage for all packages

Learnt from: Listener430
Repo: cloudposse/atmos PR: 1149
File: tests/snapshots/TestCLICommands_atmos_vendor_pull_ssh.stderr.golden:7-7
Timestamp: 2025-03-18T12:26:25.329Z
Learning: In the Atmos project, typos or inconsistencies in test snapshot files (such as "terrafrom" instead of "terraform") may be intentional as they capture the exact output of commands and should not be flagged as issues requiring correction.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-22T20:29:20.586Z
Learning: Atmos test pattern: prefer os.Executable()+TestMain with an env gate (_ATMOS_TEST_EXIT_ONE) over platform-specific binaries; always add a prerequisite test to prove ExecuteShellCommand propagates env to subprocesses before relying on it in recovery-path tests.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-22T18:02:49.032Z
Learning: In cloudposse/atmos lint stacks, current behavior when --stack has no matching raw manifest is to fall back to repo-scope; this can produce orphan noise. Prefer failing closed or requiring explicit opt-in to repo-scope for L-07 when no seed is found.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 1185
File: internal/exec/yaml_func_store.go:26-26
Timestamp: 2025-04-04T02:03:23.676Z
Learning: The Atmos codebase currently uses `log.Fatal` for error handling in multiple places. The maintainers are aware this isn't an ideal pattern (should only be used in main() or init() functions) and plan to address it comprehensively in a separate PR. CodeRabbit should not flag these issues or push for immediate changes until that refactoring is complete.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-22T15:27:52.572Z
Learning: Atmos test pattern: Use os.Executable()+TestMain with an env flag (_ATMOS_TEST_EXIT_ONE) to create a cross‑platform “exit 1” subprocess in tests; avoid Unix-only binaries (false/true/sh). Also add a prerequisite test to prove env propagation to ExecuteShellCommand before relying on it in recovery tests.

Learnt from: osterman
Repo: cloudposse/atmos PR: 808
File: pkg/config/config.go:478-483
Timestamp: 2024-12-02T21:26:32.337Z
Learning: In the 'atmos' project, when reviewing Go code like `pkg/config/config.go`, avoid suggesting file size checks after downloading remote configs if such checks aren't implemented elsewhere in the codebase.

Learnt from: nitrocode
Repo: cloudposse/atmos PR: 0
File: :0-0
Timestamp: 2026-03-23T05:38:47.048Z
Learning: In cloudposse/atmos PR `#2175`, SeedArchivedRepoCache, ResetArchivedRepoCache, newGitHubClientHook, and newGitHubClientHookMu are intentionally kept in the production package pkg/github/repo.go (not export_test.go) because they manipulate unexported package-level state that must be accessible from tests in multiple packages (e.g., internal/exec). Go's export_test.go mechanism is limited to the package under test and is invisible when another package imports the production package during its own tests. These are documented with "NOTE: test utility — cannot move to export_test.go" comments. Do not flag these as production API pollution in future reviews for this repository.

Learnt from: Benbentwo
Repo: cloudposse/atmos PR: 1475
File: internal/exec/terraform.go:269-272
Timestamp: 2025-10-03T18:02:08.535Z
Learning: In internal/exec/terraform.go, when auth.TerraformPreHook fails, the error is logged but execution continues. This is a deliberate design choice to allow Terraform commands to proceed even if authentication setup fails, rather than failing fast.

Learnt from: osterman
Repo: cloudposse/atmos PR: 1813
File: cmd/terraform/shell.go:28-73
Timestamp: 2025-12-13T03:21:35.786Z
Learning: In Atmos, when calling cfg.InitCliConfig, you must first populate the schema.ConfigAndStacksInfo struct with global flag values using flags.ParseGlobalFlags(cmd, v) rather than passing an empty struct. The LoadConfig function (pkg/config/load.go) reads config selection fields (AtmosConfigFilesFromArg, AtmosConfigDirsFromArg, BasePath, ProfilesFromArg) directly from the ConfigAndStacksInfo struct, NOT from Viper. Passing an empty struct causes config selection flags (--base-path, --config, --config-path, --profile) to be silently ignored. Correct pattern: parse flags → populate struct → call InitCliConfig. See cmd/terraform/plan_diff.go for reference implementation.

Learnt from: osterman
Repo: cloudposse/atmos PR: 1697
File: internal/exec/oci_utils.go:0-0
Timestamp: 2025-11-08T19:56:18.660Z
Learning: In the Atmos codebase, when a function receives an `*schema.AtmosConfiguration` parameter, it should read configuration values from `atmosConfig.Settings` fields rather than using direct `os.Getenv()` or `viper.GetString()` calls. The Atmos pattern is: viper.BindEnv in cmd/root.go binds environment variables → Viper unmarshals into atmosConfig.Settings via mapstructure → business logic reads from the Settings struct. This provides centralized config management, respects precedence, and enables testability. Example: `atmosConfig.Settings.AtmosGithubToken` instead of `os.Getenv("ATMOS_GITHUB_TOKEN")` in functions like `getGHCRAuth` in internal/exec/oci_utils.go.

Learnt from: haitham911
Repo: cloudposse/atmos PR: 727
File: internal/exec/terraform_clean.go:215-223
Timestamp: 2024-10-27T04:41:49.199Z
Learning: In `internal/exec/terraform_clean.go`, the function `determineCleanPath` is necessary and should not be removed.

Learnt from: aknysh
Repo: cloudposse/atmos PR: 944
File: go.mod:206-206
Timestamp: 2025-01-17T00:18:57.769Z
Learning: For indirect dependencies with license compliance issues in the cloudposse/atmos repository, the team prefers to handle them in follow-up PRs rather than blocking the current changes, as these issues often require deeper investigation of the dependency tree.

Learnt from: osterman
Repo: cloudposse/atmos PR: 768
File: internal/exec/vendor_utils.go:496-513
Timestamp: 2024-11-22T12:38:33.132Z
Learning: In the Atmos project, continue to flag path traversal issues in code reviews but acknowledge when they are expected and acceptable in specific cases.

Learnt from: osterman
Repo: cloudposse/atmos PR: 1686
File: docs/prd/tool-dependencies-integration.md:58-64
Timestamp: 2025-12-13T06:07:37.766Z
Learning: cloudposse/atmos: For PRD docs (docs/prd/*.md), markdownlint issues like MD040/MD010/MD034 can be handled in a separate documentation cleanup commit and should not block the current PR.

Learnt from: osterman
Repo: cloudposse/atmos PR: 887
File: internal/exec/workflow_utils.go:167-169
Timestamp: 2024-12-25T20:28:19.618Z
Learning: The user plans to revert the change from `path.Join` to `filepath.Join` in this PR due to testing gaps and will open a new PR to safely handle the migration without breaking `main`.

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/exec/terraform_execute_helpers_exec.go (1)
219-227: ⚠️ Potential issue | 🟠 Major

Preserve the select-path routing on the workspace new fallback.

Line 219 drops the wsOpts and workspaceSelectRedirectStdErr behavior from the earlier workspace select call. If terraform output or show has to create the workspace, the Created and switched to workspace ... banner can leak onto stdout and corrupt the data-producing command output.
🛠️ Proposed fix
 	newErr := ExecuteShellCommand(
 		*atmosConfig,
 		info.Command,
 		[]string{"workspace", "new", info.TerraformWorkspace},
 		componentPath,
 		info.ComponentEnvList,
 		info.DryRun,
-		info.RedirectStdErr,
+		workspaceSelectRedirectStdErr,
+		wsOpts...,
 	)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/exec/terraform_execute_helpers_exec.go` around lines 219 - 227, The
fallback that runs ExecuteShellCommand to create a workspace (the "workspace
new" call) must preserve the same select-path routing and stderr-redirection
used by the earlier "workspace select" attempt; update the ExecuteShellCommand
invocation for workspace creation so it reuses wsOpts (or the same option
slice/values passed into the "workspace select" call) and passes
workspaceSelectRedirectStdErr (or the same redirect flag) instead of dropping
them, ensuring the created-banner remains on the routed/redirected stream and
does not pollute stdout for functions like terraform output/show.

♻️ Duplicate comments (4)

internal/exec/validate_stacks_test.go (1)

105-117: ⚠️ Potential issue | 🟡 Minor

Don't drop the WalkDir failure.

The callback now returns err, but _ = filepath.WalkDir(...) still discards it. A bad fixture path or read failure turns into a confusing count assertion instead of surfacing the real scan error.

Suggested fix

-		_ = filepath.WalkDir(filepath.Join(absPath, "stacks"), func(_ string, d os.DirEntry, err error) error {
+		walkErr := filepath.WalkDir(filepath.Join(absPath, "stacks"), func(_ string, d os.DirEntry, err error) error {
 			if err != nil {
 				return err
 			}
 			if d == nil {
 				return nil
 			}
 			if !d.IsDir() && strings.HasSuffix(d.Name(), ".yaml") {
 				fixtureFileCount++
 			}
 			return nil
 		})
+		require.NoError(t, walkErr, "failed to scan stacks fixture")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/exec/validate_stacks_test.go` around lines 105 - 117, The call to
filepath.WalkDir currently discards its returned error (using `_ =
filepath.WalkDir(...)`), so failures scanning the fixtures under
filepath.Join(absPath, "stacks") are hidden and tests only show a later count
assertion; change the test to capture and check the WalkDir error (e.g., err :=
filepath.WalkDir(...); require.NoError(t, err) or if err != nil { t.Fatalf(...)
}) so any path/read failure is surfaced, keeping the same callback that
increments fixtureFileCount and referencing the same absPath and
fixtureFileCount variables.

internal/exec/terraform_execute_helpers_pipeline_test.go (1)

201-203: ⚠️ Potential issue | 🟡 Minor

Restore the previous logger output, not a hardcoded one.

Both tests mutate the global logger and always put it back to os.Stderr. If another test configured a different writer first, this leaks state across the package and makes failures order-dependent. Snapshot the current output before redirecting and restore that value in the defer.

Also applies to: 241-244
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/exec/terraform_execute_helpers_pipeline_test.go` around lines 201 -
203, The test unconditionally restores the global logger output to os.Stderr,
which leaks state; capture the current output writer before changing it (e.g.
oldOut := log.Default().Writer() or obtain via log.Default().Writer()/Output)
then call log.Default().SetOutput(&logBuf) and defer restoring the previous
writer with log.Default().SetOutput(oldOut); apply the same change to both
places (around the logBuf setup at the locations using logBuf) so the original
logger output is preserved and restored instead of hardcoding os.Stderr.

pkg/merge/merge_no_duplicate_errors_test.go (1)

95-97: ⚠️ Potential issue | 🟡 Minor

Assert the outer merge sentinel too.

This only pins ErrMergeTypeMismatch. If Merge stops wrapping with ErrMerge but still returns the inner sentinel, this test still passes while the public error contract regresses.
Suggested tighten-up
-	// The error must be returned, not swallowed, and must wrap ErrMergeTypeMismatch.
-	assert.True(t, errors.Is(err, errUtils.ErrMergeTypeMismatch),
-		"type-mismatch must return an error wrapping ErrMergeTypeMismatch, got: %v", err)
+	// The error must be returned, not swallowed, and must keep the full merge error chain.
+	if assert.Error(t, err, "type-mismatch must return an error") {
+		assert.True(t, errors.Is(err, errUtils.ErrMerge))
+		assert.True(t, errors.Is(err, errUtils.ErrMergeTypeMismatch))
+	}
Based on learnings: Applies to **/*_test.go : Test behavior, not implementation; never test stub functions (implement or remove); avoid tautological tests; make code testable via DI; no coverage theater; remove always-skipped tests; use errors.Is() for error checking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/merge/merge_no_duplicate_errors_test.go` around lines 95 - 97, The test
only asserts that the returned error wraps errUtils.ErrMergeTypeMismatch but
does not assert the outer sentinel errUtils.ErrMerge; update the Test (around
the assertion of errors.Is(err, errUtils.ErrMergeTypeMismatch) for the Merge
call) to also assert errors.Is(err, errUtils.ErrMerge) so the public contract
that Merge returns an error wrapped by ErrMerge is enforced; locate the Merge
call and existing assertion in merge_no_duplicate_errors_test.go and add the
additional errors.Is check against errUtils.ErrMerge (keeping the existing
inner-sentinel check).

pkg/merge/merge_native.go (1)

83-87: ⚠️ Potential issue | 🟠 Major

Wrap recursive failures with key/index context.

These branches bubble the raw merge error back up unchanged. On a large stack, callers learn that a merge failed but not where, which forces substring hunting instead of pointing at the offending path. Wrapping here with merge key %q / merge slice index %d keeps errors.Is intact while making the failure actionable.

Suggested shape

+import (
+	"fmt"
+	"reflect"
+
+	errUtils "github.com/cloudposse/atmos/errors"
+)

 		if srcMap, srcIsMap := srcVal.(map[string]any); srcIsMap {
 			if dstMap, dstIsMap := dstVal.(map[string]any); dstIsMap {
 				if err := deepMergeNative(dstMap, srcMap, appendSlice, sliceDeepCopy); err != nil {
-					return err
+					return fmt.Errorf("merge key %q: %w", k, err)
 				}
 				continue
 			}
@@
 		if normalizedSrcMap, ok := deepCopyValue(srcVal).(map[string]any); ok {
 			if dstMap, dstIsMap := dstVal.(map[string]any); dstIsMap {
 				if err := deepMergeNative(dstMap, normalizedSrcMap, appendSlice, sliceDeepCopy); err != nil {
-					return err
+					return fmt.Errorf("merge key %q: %w", k, err)
 				}
 				continue
 			}
@@
 		if err := deepMergeNative(merged, srcMap, appendSlice, sliceDeepCopy); err != nil {
-			return nil, err
+			return nil, fmt.Errorf("merge slice index %d: %w", i, err)
 		}

As per coding guidelines, "Follow Go's error handling idioms: use meaningful error messages, wrap errors with context using fmt.Errorf("context: %w", err)."

Also applies to: 98-101, 224-225

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/merge/merge_native.go` around lines 83 - 87, The recursive calls to
deepMergeNative in the map and slice fast-paths should wrap returned errors with
contextual path info so callers know which key or index failed; update the map
branch where deepMergeNative(dstMap, srcMap, appendSlice, sliceDeepCopy) is
called to wrap errors as fmt.Errorf("merge key %q: %w", key, err), and similarly
wrap errors from recursive merges in slice branches as fmt.Errorf("merge slice
index %d: %w", idx, err). Apply the same wrapping pattern to the other recursive
call sites referenced (the other map/slice recursion points) so errors.Is
remains usable but messages include the offending key/index.

🧹 Nitpick comments (4)

internal/exec/workflow_identity_test.go (1)
141-143: Precondition check prevents confusing failures.

Good addition — this test spawns an actual atmos subprocess, so skipping when the binary is missing is appropriate.

Consider using the existing helper for consistency:
♻️ Optional refactor
-	if _, err := exec.LookPath("atmos"); err != nil {
-		t.Skip("skipping: atmos binary not found in PATH (required to execute atmos workflow steps)")
-	}
+	tests.RequireExecutable(t, "atmos", "executing atmos workflow steps")
This would require adding the "github.com/cloudposse/atmos/tests" import. Either approach works — the current code is functionally correct.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/exec/workflow_identity_test.go` around lines 141 - 143, Replace the
manual exec.LookPath check with the project test helper: remove the
exec.LookPath block and instead call tests.RequireAtmos(t) (or the equivalent
helper in the "github.com/cloudposse/atmos/tests" package) at the top of
TestWorkflowIdentity to skip when the atmos binary is missing; add the
"github.com/cloudposse/atmos/tests" import and use the helper
(tests.RequireAtmos) in place of the current exec.LookPath check to keep
consistency with other tests.
internal/exec/yaml_func_template_test.go (1)
397-401: Extract the Terraform/OpenTofu precheck into one local helper.

The same LookPath fallback and skip message are copied into three subtests here. A small closure inside TestYamlFuncTemplate_Integration would keep the gating consistent and make later fixture changes cheaper.

Also applies to: 439-443, 468-472
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/exec/yaml_func_template_test.go` around lines 397 - 401, Extract the
repeated exec.LookPath precheck into a single local helper closure inside
TestYamlFuncTemplate_Integration (e.g., precheck := func(t *testing.T) { ... })
that runs exec.LookPath("tofu") and exec.LookPath("terraform") and calls t.Skip
with the existing message when both are missing; replace the three duplicated
blocks that currently call exec.LookPath and t.Skip with calls to this
precheck(t) to keep gating consistent across subtests and simplify future
fixture changes.
pkg/merge/merge_test.go (1)
764-808: Collapse these into one table-driven merge-flags test and add one typed/nested row.

The three cases are the same harness with different booleans, and they only pin flat []any scalar behavior. A single table-driven test with one typed or nested slice case would cover the risky native-merge shapes with less duplication.

As per coding guidelines, "Use table-driven tests for testing multiple scenarios in Go."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/merge/merge_test.go` around lines 764 - 808, Combine the three tests
TestMergeWithOptions_AppendSliceFlag, TestMergeWithOptions_SliceDeepCopyFlag,
and TestMergeWithOptions_NoFlagsReplaceList into a single table-driven test
(e.g., TestMergeWithOptions_Flags) that iterates cases with fields for
appendSlice and sliceDeepCopy, inputs, and expected result; for each case call
MergeWithOptions(nil, inputs, appendSlice, sliceDeepCopy) and assert no error
and that the resulting "tags" value equals the expected slice. Include one
additional case that uses a typed or nested slice (for example []int or a slice
of maps) to exercise deepMergeNative's typed/nested merge behavior. Keep each
table entry descriptive and run the same assertions per-case so the different
flag combinations and the typed/nested shape are covered without duplicating the
test harness.
docs/fixes/2026-03-19-deep-merge-native-fixes.md (1)
22-35: Optional: Add language tags to code blocks for better rendering.

The fenced code blocks at lines 22 and 29 are missing language specifiers. Consider adding language tags for syntax highlighting:
♻️ Suggested markdown improvements
 **Before:**
-```
+```text
 for each input:
 **After:**
-```
+```text
 result = DeepCopyMap(inputs[0])
This addresses the markdownlint warnings and improves documentation readability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/fixes/2026-03-19-deep-merge-native-fixes.md` around lines 22 - 35, Add
explicit language tags to the two fenced code blocks that show the merge
pseudocode (the blocks containing "for each input:" / "copy = DeepCopyMap(input)
..." and "result = DeepCopyMap(inputs[0])" / "deepMergeNative(result, input)"),
e.g., replace the opening ``` with ```text or ```pseudo so markdownlint and
syntax highlighters render them correctly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/exec/terraform_execute_single_invocation_test.go`:
- Around line 77-90: The test currently ignores the return from
executeCommandPipeline; capture its error (e.g., err :=
executeCommandPipeline(&atmosConfig, &info, execCtx)) and add an assertion that
the error wraps the exit-code type (use errors.As to check err is or contains
errUtils.ExitCodeError) before or alongside the counter-file assertions so the
test ensures both a single subprocess invocation and that the pipeline preserved
the ExecuteShellCommand exit-1 semantics (refer to executeCommandPipeline and
errUtils.ExitCodeError to locate the code).

In `@internal/exec/terraform_utils.go`:
- Around line 110-136: The helper isTerraformCurrentWorkspace reads TF_DATA_DIR
from the current process environment (os.Getenv) which can differ from the child
env used by the Terraform subprocess; change the function signature
(isTerraformCurrentWorkspace) to accept either the resolved env map (e.g., env
map[string]string or []string) or a precomputed tfDataDir string and use that
value instead of calling os.Getenv inside the function, then update callers
(where execEnvFromConfig / info builds the subprocess env) to pass the resolved
TF_DATA_DIR (or the env) through so envFile is constructed from the same env
Terraform used.

In `@internal/exec/yaml_func_terraform_state_workspaces_disabled_test.go`:
- Around line 173-179: The test currently downgrades failures to delete
terraform.tfstate and terraform.tfstate.backup to t.Logf, which can leave stale
state in the shared fixture; update the cleanup in
yaml_func_terraform_state_workspaces_disabled_test.go to attempt a retry loop
(e.g., try os.Remove on filepath.Join(mockComponentPath, "terraform.tfstate")
and filepath.Join(mockComponentPath, "terraform.tfstate.backup") several times
with small sleeps if removal fails due to Windows locking) and, after retries,
assert/fail the test (use t.Fatalf or t.Errorf) if the file still exists (check
with os.Stat or os.IsNotExist) instead of only logging via t.Logf; keep
references to os.Remove, os.IsNotExist, and the mockComponentPath join calls
when implementing the fix.

In `@pkg/merge/merge_compare_mergo_test.go`:
- Around line 119-146: The test "sliceDeepCopy merges overlapping map elements"
currently only asserts names but the comments describe expectations for merged
scalar/added fields; update the assertions to verify those fields: after calling
MergeWithOptions(cfg, inputs, false, true) and extracting groups, assert for g0
(variable g0) that g0["size"] == "large" (dst override applied), and for g1
(variable g1) assert g1["size"] == "medium" and g1["replicas"] == 3 so the test
actually validates the described merge behavior of MergeWithOptions.
- Around line 8-10: The test run example uses an incorrect -run pattern; update
the documented command to match existing tests (e.g., use -run TestCompareMergo
to run both TestCompareMergo_NestedMapsMerge and TestCompareMergo_SliceModes, or
a regex like -run TestCompareMergo_ to match all TestCompareMergo_* tests). Edit
the comment block containing the local run command so the -run argument
references TestCompareMergo (or TestCompareMergo_) instead of CrossValidate to
ensure the tests actually execute.

In `@website/blog/2026-03-15-faster-deep-merge.mdx`:
- Around line 75-84: The paragraph about sliceDeepCopy is incorrect: update the
website copy to state that when sliceDeepCopy is active the resulting slice
grows to max(len(dst), len(src)) so extra source elements (the tail, e.g.,
"gpu") are preserved rather than dropped; reference the actual behavior
implemented in mergeSlicesNative (in merge_native.go) and validated by
merge_compare_mergo_test.go (run with go test -tags compare_mergo
./pkg/merge/...) and adjust the wording to describe that contract and its
cross-validation against mergo.

---

Outside diff comments:
In `@internal/exec/terraform_execute_helpers_exec.go`:
- Around line 219-227: The fallback that runs ExecuteShellCommand to create a
workspace (the "workspace new" call) must preserve the same select-path routing
and stderr-redirection used by the earlier "workspace select" attempt; update
the ExecuteShellCommand invocation for workspace creation so it reuses wsOpts
(or the same option slice/values passed into the "workspace select" call) and
passes workspaceSelectRedirectStdErr (or the same redirect flag) instead of
dropping them, ensuring the created-banner remains on the routed/redirected
stream and does not pollute stdout for functions like terraform output/show.

---

Duplicate comments:
In `@internal/exec/terraform_execute_helpers_pipeline_test.go`:
- Around line 201-203: The test unconditionally restores the global logger
output to os.Stderr, which leaks state; capture the current output writer before
changing it (e.g. oldOut := log.Default().Writer() or obtain via
log.Default().Writer()/Output) then call log.Default().SetOutput(&logBuf) and
defer restoring the previous writer with log.Default().SetOutput(oldOut); apply
the same change to both places (around the logBuf setup at the locations using
logBuf) so the original logger output is preserved and restored instead of
hardcoding os.Stderr.

In `@internal/exec/validate_stacks_test.go`:
- Around line 105-117: The call to filepath.WalkDir currently discards its
returned error (using `_ = filepath.WalkDir(...)`), so failures scanning the
fixtures under filepath.Join(absPath, "stacks") are hidden and tests only show a
later count assertion; change the test to capture and check the WalkDir error
(e.g., err := filepath.WalkDir(...); require.NoError(t, err) or if err != nil {
t.Fatalf(...) }) so any path/read failure is surfaced, keeping the same callback
that increments fixtureFileCount and referencing the same absPath and
fixtureFileCount variables.

In `@pkg/merge/merge_native.go`:
- Around line 83-87: The recursive calls to deepMergeNative in the map and slice
fast-paths should wrap returned errors with contextual path info so callers know
which key or index failed; update the map branch where deepMergeNative(dstMap,
srcMap, appendSlice, sliceDeepCopy) is called to wrap errors as
fmt.Errorf("merge key %q: %w", key, err), and similarly wrap errors from
recursive merges in slice branches as fmt.Errorf("merge slice index %d: %w",
idx, err). Apply the same wrapping pattern to the other recursive call sites
referenced (the other map/slice recursion points) so errors.Is remains usable
but messages include the offending key/index.

In `@pkg/merge/merge_no_duplicate_errors_test.go`:
- Around line 95-97: The test only asserts that the returned error wraps
errUtils.ErrMergeTypeMismatch but does not assert the outer sentinel
errUtils.ErrMerge; update the Test (around the assertion of errors.Is(err,
errUtils.ErrMergeTypeMismatch) for the Merge call) to also assert errors.Is(err,
errUtils.ErrMerge) so the public contract that Merge returns an error wrapped by
ErrMerge is enforced; locate the Merge call and existing assertion in
merge_no_duplicate_errors_test.go and add the additional errors.Is check against
errUtils.ErrMerge (keeping the existing inner-sentinel check).

---

Nitpick comments:
In `@docs/fixes/2026-03-19-deep-merge-native-fixes.md`:
- Around line 22-35: Add explicit language tags to the two fenced code blocks
that show the merge pseudocode (the blocks containing "for each input:" / "copy
= DeepCopyMap(input) ..." and "result = DeepCopyMap(inputs[0])" /
"deepMergeNative(result, input)"), e.g., replace the opening ``` with ```text or
```pseudo so markdownlint and syntax highlighters render them correctly.

In `@internal/exec/workflow_identity_test.go`:
- Around line 141-143: Replace the manual exec.LookPath check with the project
test helper: remove the exec.LookPath block and instead call
tests.RequireAtmos(t) (or the equivalent helper in the
"github.com/cloudposse/atmos/tests" package) at the top of TestWorkflowIdentity
to skip when the atmos binary is missing; add the
"github.com/cloudposse/atmos/tests" import and use the helper
(tests.RequireAtmos) in place of the current exec.LookPath check to keep
consistency with other tests.

In `@internal/exec/yaml_func_template_test.go`:
- Around line 397-401: Extract the repeated exec.LookPath precheck into a single
local helper closure inside TestYamlFuncTemplate_Integration (e.g., precheck :=
func(t *testing.T) { ... }) that runs exec.LookPath("tofu") and
exec.LookPath("terraform") and calls t.Skip with the existing message when both
are missing; replace the three duplicated blocks that currently call
exec.LookPath and t.Skip with calls to this precheck(t) to keep gating
consistent across subtests and simplify future fixture changes.

In `@pkg/merge/merge_test.go`:
- Around line 764-808: Combine the three tests
TestMergeWithOptions_AppendSliceFlag, TestMergeWithOptions_SliceDeepCopyFlag,
and TestMergeWithOptions_NoFlagsReplaceList into a single table-driven test
(e.g., TestMergeWithOptions_Flags) that iterates cases with fields for
appendSlice and sliceDeepCopy, inputs, and expected result; for each case call
MergeWithOptions(nil, inputs, appendSlice, sliceDeepCopy) and assert no error
and that the resulting "tags" value equals the expected slice. Include one
additional case that uses a typed or nested slice (for example []int or a slice
of maps) to exercise deepMergeNative's typed/nested merge behavior. Keep each
table entry descriptive and run the same assertions per-case so the different
flag combinations and the typed/nested shape are covered without duplicating the
test harness.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 28b959e0-cf50-485b-8d0d-486606c4d151

📥 Commits

Reviewing files that changed from the base of the PR and between fe8ebf3 and 76ab254.

📒 Files selected for processing (35)

CLAUDE.md
docs/fixes/2026-03-19-deep-merge-native-fixes.md
errors/errors.go
internal/exec/terraform_clean_test.go
internal/exec/terraform_execute_exit_wrapping_test.go
internal/exec/terraform_execute_helpers_exec.go
internal/exec/terraform_execute_helpers_pipeline_test.go
internal/exec/terraform_execute_helpers_workspace_test.go
internal/exec/terraform_execute_single_invocation_test.go
internal/exec/terraform_test.go
internal/exec/terraform_utils.go
internal/exec/terraform_utils_test.go
internal/exec/testmain_test.go
internal/exec/validate_stacks_test.go
internal/exec/workflow_identity_test.go
internal/exec/workflow_utils_test.go
internal/exec/yaml_func_template_test.go
internal/exec/yaml_func_terraform_output_test.go
internal/exec/yaml_func_terraform_state_test.go
internal/exec/yaml_func_terraform_state_workspaces_disabled_test.go
internal/exec/yaml_func_utils_test.go
pkg/devcontainer/config_loader.go
pkg/merge/merge.go
pkg/merge/merge_compare_mergo_test.go
pkg/merge/merge_context_demo_test.go
pkg/merge/merge_context_test.go
pkg/merge/merge_native.go
pkg/merge/merge_native_test.go
pkg/merge/merge_no_duplicate_errors_test.go
pkg/merge/merge_test.go
pkg/merge/merge_yaml_functions.go
tests/preconditions.go
website/blog/2026-03-15-faster-deep-merge.mdx
website/blog/authors.yml
website/src/data/roadmap.js

internal/exec/terraform_execute_single_invocation_test.go

internal/exec/terraform_utils.go

internal/exec/yaml_func_terraform_state_workspaces_disabled_test.go

pkg/merge/merge_compare_mergo_test.go

website/blog/2026-03-15-faster-deep-merge.mdx

- isTerraformCurrentWorkspace: accept envList param so TF_DATA_DIR from subprocess env (ComponentEnvList) takes precedence over process env - Add envVarFromList helper (companion to envKeyIsSet) - Assert ExitCodeError in single-invocation regression test - Fix logger restoration in pipeline tests (Writer() doesn't exist) - Add require import to merge_test.go (table-driven tests use it) - Strengthen sliceDeepCopy cross-validation assertions (size, replicas) - Fix misleading comments in compare_mergo_test.go - Fix blog post: sliceDeepCopy extends result, not truncates - Fix -run pattern in docs: CrossValidate -> CompareMergo - Retry-based cleanup for shared fixture state files (Windows compat) - Add code block language tags to fix doc - Assert both ErrMerge and ErrMergeTypeMismatch sentinels in error tests - Wrap recursive merge errors with key/index context Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

aknysh

thanks @nitrocode
I'm merging the PR, it will be available in the next Atmos minor release.
If any issues, we'll fix and create a patch release

github-actions · 2026-03-24T01:13:13Z

These changes were released in v1.211.1-rc.0.

Initial plan

1a407e5

Copilot AI assigned Copilot and nitrocode Mar 15, 2026

Copilot started work on behalf of nitrocode March 15, 2026 01:15 View session

mergify bot added triage Needs triage wip Work in Progress: Not ready for final review or merge labels Mar 15, 2026

nitrocode added patch A minor, backward compatible change and removed triage Needs triage labels Mar 15, 2026

Copilot AI changed the title ~~[WIP] Improve performance of deep merge for YAML files~~ perf(merge): replace mergo pre-copy loop with reflection-free native deep merge (3.5× faster) Mar 15, 2026

Copilot AI requested a review from nitrocode March 15, 2026 01:40

Copilot finished work on behalf of nitrocode March 15, 2026 01:40

nitrocode marked this pull request as ready for review March 15, 2026 01:40

nitrocode requested a review from a team as a code owner March 15, 2026 01:40

github-actions bot added the size/m Medium size PR label Mar 15, 2026

mergify bot removed the wip Work in Progress: Not ready for final review or merge label Mar 15, 2026

[autofix.ci] apply automated fixes

2112415

github-actions bot deployed to preview March 15, 2026 01:48 View deployment

github-advanced-security bot found potential problems Mar 15, 2026

View reviewed changes

pkg/merge/merge_native.go Fixed Show fixed Hide fixed

pkg/merge/merge_native.go Fixed Show fixed Hide fixed

pkg/merge/merge_native.go Fixed Show fixed Hide fixed

pkg/merge/merge_native.go Fixed Show fixed Hide fixed