Skip to content
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
1a407e5
Initial plan
Copilot Mar 15, 2026
9c5d6bd
perf(merge): replace mergo pre-copy loop with reflection-free native …
Copilot Mar 15, 2026
2112415
[autofix.ci] apply automated fixes
autofix-ci[bot] Mar 15, 2026
a8ed4c5
fix(merge): use overflow-safe safeAdd for size computations (fixes Gi…
Copilot Mar 15, 2026
006e148
fix(merge): restore type-check behavior, handle typed maps, fix tests
Copilot Mar 15, 2026
e660fa3
Merge branch 'main' into copilot/improve-deep-merge-performance
nitrocode Mar 15, 2026
c6631c3
test(merge): add 7 coverage tests for uncovered branches in deepMerge…
Copilot Mar 15, 2026
8d9d7e1
fix: resolve Windows test failures for TestProcessCustomYamlTags
Copilot Mar 15, 2026
c1b1136
fix: address CodeRabbit feedback on deep merge native and workspace h…
Copilot Mar 19, 2026
b865d17
Merge branch 'main' into copilot/improve-deep-merge-performance
nitrocode Mar 19, 2026
34dc8a3
[autofix.ci] apply automated fixes
autofix-ci[bot] Mar 19, 2026
31848ab
Changes before error encountered
Copilot Mar 19, 2026
0b8c1e2
fix: address remaining CodeRabbit feedback on deep-merge native imple…
Copilot Mar 20, 2026
8597476
Merge branch 'main' into copilot/improve-deep-merge-performance
nitrocode Mar 22, 2026
fe9b2d2
fix: resolve all 12 CodeRabbit action items for deep-merge native imp…
Copilot Mar 22, 2026
29c3b83
[autofix.ci] apply automated fixes
autofix-ci[bot] Mar 22, 2026
47089f5
fix: remove duplicate terraform execution pipeline and wire workspace…
Copilot Mar 22, 2026
4f7f8a9
fix: resolve all CodeRabbit v8 audit items — false test failure, doub…
Copilot Mar 22, 2026
880b2f7
fix: resolve all CodeRabbit v9 audit items — cap removal, workspace r…
Copilot Mar 22, 2026
cf9515e
fix: resolve all CodeRabbit v10 audit items — cross-platform tests, s…
Copilot Mar 22, 2026
745a67a
fix: resolve all CodeRabbit v11 audit items — loud guards, env proof,…
Copilot Mar 22, 2026
778ad17
fix: resolve all CodeRabbit v12 audit items — mismatch test, compare_…
Copilot Mar 22, 2026
b2977b4
[autofix.ci] apply automated fixes
autofix-ci[bot] Mar 22, 2026
9f0853b
fix: resolve all CodeRabbit v13 audit items — coverage gap tests, war…
Copilot Mar 22, 2026
af47dbd
fix: resolve all PR review 3988411457 and CR v13/v14 actionable items
Copilot Mar 22, 2026
d750572
Merge branch 'main' into copilot/improve-deep-merge-performance
nitrocode Mar 22, 2026
7bb1195
[autofix.ci] apply automated fixes
autofix-ci[bot] Mar 22, 2026
1f2cf50
fix: address CR v15 items — remove dead code in safeCap, fix spelling…
Copilot Mar 22, 2026
8a71e8d
fix: address inline review comments — slice→map guard, test quality, …
Copilot Mar 23, 2026
9460cec
fix: address CR v9 — CWD-independent fixture path, strategy-flag test…
Copilot Mar 23, 2026
ca85936
test: add executeMainTerraformCommand error-propagation contract test…
Copilot Mar 23, 2026
13a6781
test: clarify comments in error-propagation test and benchmark per co…
Copilot Mar 23, 2026
facb222
[autofix.ci] apply automated fixes
autofix-ci[bot] Mar 23, 2026
07dc883
docs: add deep analysis of native merge PR #2201
aknysh Mar 23, 2026
7d0625b
test: add guardrail tests for ExitCodeError wrapping and single subpr…
Copilot Mar 23, 2026
2d8da85
fix: sliceDeepCopy truncation, lint issues, cross-validation, mergo docs
aknysh Mar 23, 2026
ccf4d4f
Merge remote-tracking branch 'origin/copilot/improve-deep-merge-perfo…
aknysh Mar 23, 2026
76ab254
[autofix.ci] apply automated fixes
autofix-ci[bot] Mar 23, 2026
247c864
Merge branch 'main' into copilot/improve-deep-merge-performance
aknysh Mar 23, 2026
0858696
fix: address CodeRabbit audit findings for deep merge PR
aknysh Mar 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,18 @@ ALWAYS use `cmd.NewTestKit(t)` for cmd tests. Auto-cleans RootCmd state (flags,
- No coverage theater
- Remove always-skipped tests
- Use `errors.Is()` for error checking
- **For aliasing/isolation tests, verify BOTH directions:** after a merge, mutate the result and confirm the original inputs are unchanged (result→src isolation); also mutate a source map before the merge and confirm the result is unaffected (src→result isolation).
- **For slice-result tests, assert element contents, not just length:** `require.Len` alone allows regressions that drop or corrupt contents. Assert at least the first and last element by value.
- **Never use platform-specific binaries in tests** (e.g., `false`, `true`, `sh` on Unix): these don't exist on Windows. Use Go-native test helpers: subprocess via `os.Executable()` + `TestMain`, temp files with cross-platform scripts, or DI to inject a fake command runner.
- **Safety guards must fail loudly:** any check that counts fixture files or validates test preconditions must use `require.Positive` (or equivalent) — never `if count > 0 { ... }` which silently disables the check when misconfigured.
- **Use absolute paths for fixture counting:** any `filepath.WalkDir` or file-count assertion must use an already-resolved absolute path (not a relative one) to be CWD-independent.
- **Add compile-time sentinels for schema field references in tests:** when a test uses a specific struct field (e.g., `schema.Provider{Kind: "azure"}`), add `var _ = schema.Provider{Kind: "azure"}` as a compile guard so a field rename immediately fails the build.
- **Add prerequisite sub-tests for subprocess behavior:** when a test depends on implicit env propagation (e.g., `ComponentEnvList` reaching a subprocess), add an explicit sub-test that confirms the behavior before the main test runs.
- **Contract vs. legacy behavior:** if a test says "matches mergo" (or any other library), add an opt-in cross-validation test behind a build tag (e.g., `//go:build compare_mergo`); otherwise state "defined contract" explicitly so it's clear the native implementation owns the behavior. Run cross-validation tests with: `go test -tags compare_mergo ./pkg/merge/... -run CrossValidate -v` (requires mergo v1.0.x installed).
- **Include negative-path tests for recovery logic:** whenever a test verifies that a recovery/fallback triggers under condition X, add a corresponding test that verifies the recovery does NOT trigger when condition X is absent (e.g., mismatched workspace name).

### Follow-up Tracking (MANDATORY)
When a PR defers work to a follow-up (e.g., migration, cleanup, refactor), **open a GitHub issue and link it by number** in the blog post, roadmap, and/or PR description before merging. Blog posts with "a follow-up issue will..." with no `#number` are incomplete — the work will never be tracked.

### Mock Generation (MANDATORY)
Use `go.uber.org/mock/mockgen` with `//go:generate` directives. Never manual mocks.
Expand Down Expand Up @@ -391,6 +403,27 @@ Search `internal/exec/` and `pkg/` before implementing. Extend, don't duplicate.
### Cross-Platform (MANDATORY)
Linux/macOS/Windows compatible. Use SDKs over binaries. Use `filepath.Join()` instead of hardcoded path separators.

**Subprocess helpers in tests (cross-platform):**
Instead of `exec.LookPath("false")` or other Unix-only binaries, use the test binary itself.
**Important:** If your package already has a `TestMain`, add the env-gate check **inside the existing `TestMain`** — do not add a second `TestMain` function (Go does not allow two in the same package).

```go
// In testmain_test.go — merge this check into the existing TestMain:
func TestMain(m *testing.M) {
// If _ATMOS_TEST_EXIT_ONE is set, exit immediately with code 1.
// This lets tests use the test binary itself as a cross-platform "exit 1" command.
if os.Getenv("_ATMOS_TEST_EXIT_ONE") == "1" { os.Exit(1) }
os.Exit(m.Run())
}
// NOTE: If your package already defines TestMain, insert the _ATMOS_TEST_EXIT_ONE
// check at the top of the existing function rather than copying the whole snippet.

// In the test itself:
exePath, _ := os.Executable()
info.Command = exePath
info.ComponentEnvList = []string{"_ATMOS_TEST_EXIT_ONE=1"}
```

**Path handling in tests:**
- **NEVER use forward slash concatenation** like `tempDir + "/components/terraform/vpc"`
- **ALWAYS use `filepath.Join()`** with separate arguments: `filepath.Join(tempDir, "components", "terraform", "vpc")`
Expand Down
241 changes: 241 additions & 0 deletions docs/fixes/2026-03-19-deep-merge-native-fixes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
# Deep-Merge Native & Terraform Workspace Fixes

**Date:** 2026-03-19 (updated 2026-03-23)
**PR:** #2201 (perf: replace mergo with native deep merge)
**Reviewer findings:** CodeRabbit audit + GitHub Advanced Security alerts + independent deep analysis

---

## What the PR Does

Replaces the pre-merge deep-copy loop (which called `mergo.Merge` after copying each input)
with a native Go implementation that deep-copies only the first input and merges subsequent
inputs in-place with leaf-level copying. This reduces N full `DeepCopyMap` calls to 1,
achieving ~3.5× speedup on the ~118k+ merge calls per stack resolution run.

**This is the core of Atmos.** Every stack resolution passes through this code. Any bug
here affects every single `atmos` command that processes stacks.

### Architecture Change

**Before:**
```
for each input:
copy = DeepCopyMap(input) // Full deep copy of every input
mergo.Merge(result, copy) // mergo merge (uses reflection internally)
```

**After:**
```
result = DeepCopyMap(inputs[0]) // Deep copy only the first input
for each remaining input:
deepMergeNative(result, input) // Native merge with leaf-level copying (no reflection)
```

---

## Native Merge Semantics

### Merge Rules (merge_native.go)

| Scenario | Behavior | Correct? |
|---|---|---|
| Both map | Recursive merge | ✅ |
| Src map, dst not map | Src overrides dst | ✅ (matches mergo WithOverride) |
| Src not map, dst map | Src overrides dst | ✅ (matches mergo WithOverride) |
| Src slice, dst map | **Error** — `ErrMergeTypeMismatch` | ⚠️ Asymmetric but intentional |
| Src nil | Override dst with nil | ✅ (matches mergo WithOverride) |
| Src typed map | Normalize to `map[string]any` via reflection | ✅ |
| Src typed slice | Normalize to `[]any` via reflection | ✅ |

### Slice Merge Modes

| Mode | Behavior | Notes |
|---|---|---|
| Default | Src slice replaces dst slice | Standard |
| `appendSlice` | Dst + src elements concatenated | Both deep-copied |
| `sliceDeepCopy` | Element-wise merge, src extends result | Fixed (was truncating) |

### Type Handling

| Type | Deep Copy Method | Correct? |
|---|---|---|
| Primitives (string, int, float, bool) | Pass-through (immutable) | ✅ |
| `map[string]any` | Recursive `deepCopyMap` | ✅ |
| `[]any` | Recursive `deepCopySlice` | ✅ |
| Typed maps (`map[string]string`) | Reflection-based iteration | ✅ |
| Typed slices (`[]string`) | Reflection-based iteration | ✅ |
| Pointers | Pass-through (**aliased**) | ⚠️ Safe for YAML data |
| `nil` | Pass-through | ✅ |

---

## Issues Addressed

### 1. `sliceDeepCopy` truncation — silent data loss (CRITICAL, fixed)

**File:** `pkg/merge/merge_native.go`

**Problem:** When `sliceDeepCopy=true` and src had more elements than dst, extra src elements
were **silently dropped**. This was a data loss bug for users with `list_merge_strategy: deep`
whose overlay stacks add new list elements beyond the base.

**Example:** A base stack with 2 EKS node groups + an overlay adding a 3rd `gpu` group would
silently lose the gpu group:

```yaml
# base: 2 node groups
node_groups:
- name: general
instance_type: m5.large
- name: compute
instance_type: c5.xlarge

# overlay: adds 3rd group
node_groups:
- name: general
instance_type: m5.2xlarge
- name: compute
instance_type: c5.2xlarge
- name: gpu # ← SILENTLY DROPPED
instance_type: g5.xlarge
```

**Fix:** `mergeSlicesNative` now uses `max(len(dst), len(src))` for result length. Extra src
elements are deep-copied and appended, matching mergo's `WithSliceDeepCopy` behavior.

**Tests:** 3 existing tests updated from expecting truncation to expecting extension.
5 new cross-validation tests added for `appendSlice` and `sliceDeepCopy` modes.

### 2. `sliceDeepCopy` vs `appendSlice` precedence flip (behavioral regression, fixed)

**File:** `pkg/merge/merge_native.go`

**Problem:** The new `deepMergeNative` checked `appendSlice` before `sliceDeepCopy`, but the
old mergo code checked `WithSliceDeepCopy` first. When both flags are `true`, the old code
applied element-wise merging, the new code appended.

**Fix:** Reordered: `if sliceDeepCopy { … } else { /* appendSlice */ }`.

### 3. `mergeSlicesNative` aliased dst maps and tail elements (fixed)

**File:** `pkg/merge/merge_native.go`

**Problem (inner maps):** Shallow copy of dstMap values into merged map caused silent
corruption in multi-input merges.

**Fix:** `merged[k] = deepCopyValue(v)` for every dstMap value.

**Problem (tail elements):** `copy(result, dst)` shallow-copied tail positions, creating
aliases that could corrupt the accumulator in subsequent merge passes.

**Fix:** Deep-copy tail positions explicitly.

### 4. Misleading test name (fixed)

**File:** `pkg/merge/merge_compare_mergo_test.go`

**Problem:** Case named `"nil value in src map entry is skipped"` but nil actually overrides.

**Fix:** Renamed to `"nil value in src map entry overrides dst"`.

### 5. Cross-validation test coverage too narrow (fixed)

**File:** `pkg/merge/merge_compare_mergo_test.go`

**Problem:** Only 4 equivalence cases tested against mergo. No coverage for `appendSlice`
or `sliceDeepCopy` modes.

**Fix:** Added 5 new cross-validation tests:
- `appendSlice concatenates slices`
- `appendSlice with nested maps`
- `sliceDeepCopy merges overlapping map elements`
- `sliceDeepCopy src extends beyond dst length`
- `sliceDeepCopy with three inputs extending progressively`

### 6. `isTerraformCurrentWorkspace` default workspace handling (fixed)

**File:** `internal/exec/terraform_utils.go`

**Problem:** Terraform never writes `.terraform/environment` for the `default` workspace.
The helper always returned `false` when the file was absent, so workspace recovery never
triggered for `default`.

**Fix:** Return `true` when file is missing AND workspace is `"default"`. Return `true`
when file is empty AND workspace is `"default"`.

### 7. Workspace recovery log level too low (fixed)

**File:** `internal/exec/terraform_execute_helpers_exec.go`

**Fix:** Upgraded `log.Debug` to `log.Warn` for workspace recovery messages.

### 8. Integer overflow in size computations (fixed)

**File:** `pkg/merge/merge_native.go`

**Fix:** `safeCap(a, b)` clamps to `1<<24` (16M entries) to prevent OOM.

---

## Remaining Items

### Fixed

1. ~~**Document the mergo/native split**~~ ✅ — Added comments to all three remaining
mergo call sites explaining why they still use mergo:
- `pkg/merge/merge_yaml_functions.go:177` — YAML function slice merging has different
semantics (operates on individual elements during `!include`/`!merge`, not full stacks).
- `pkg/merge/merge_yaml_functions.go:265` — Cross-references the first comment.
- `pkg/devcontainer/config_loader.go:350` — Devcontainer uses typed structs, not
`map[string]any`. Not on the hot path.
- All three have `TODO: migrate to native merge` markers.

### Future TODOs (post-merge)

2. **Run cross-validation in CI** — Add `compare_mergo` tests to a CI job. Currently
behind `//go:build compare_mergo` build tag and only run manually.
3. **Migrate `merge_yaml_functions.go` to native merge** — Eliminate the dual mergo/native
split. Requires adapting YAML function slice semantics to the native merge API.
4. **Migrate `devcontainer/config_loader.go` to native merge** — Lower priority since
devcontainer config merging is not performance-critical and uses typed structs.
5. **Add concurrent-contract test** — Document that `deepMergeNative` is not safe for
concurrent use on the same dst (callers must synchronize).

### No Action Needed

5. `safeCap` max hint — unlikely to be hit in practice.
6. Pointer aliasing — safe for YAML-parsed data.
7. `TF_DATA_DIR` relative path — `componentPath` is correct (matches Terraform's CWD).
8. Workspace recovery dual guard — correct and well-tested.

---

## Summary of Files Changed

| File | Change |
|------|--------|
| `pkg/merge/merge_native.go` | sliceDeepCopy extension fix; precedence fix; aliasing fixes |
| `pkg/merge/merge_native_test.go` | 3 tests updated for extension; new precedence/aliasing tests |
| `pkg/merge/merge_compare_mergo_test.go` | Fix test name; add 5 cross-validation tests |
| `pkg/merge/merge.go` | Replace mergo pre-copy loop with native merge |
| `internal/exec/terraform_utils.go` | `isTerraformCurrentWorkspace` with default handling |
| `internal/exec/terraform_utils_test.go` | 11 sub-tests for workspace detection |
| `internal/exec/terraform_execute_helpers_exec.go` | Workspace recovery with log.Warn |
| `internal/exec/terraform_execute_helpers_pipeline_test.go` | Recovery path tests |
| `internal/exec/terraform_execute_helpers_workspace_test.go` | Error propagation test |
| `internal/exec/testmain_test.go` | Cross-platform subprocess helper |
| `errors/errors.go` | `ErrMergeNilDst`, `ErrMergeTypeMismatch` sentinels |

## Audit Summary

| Category | Count | Key Items |
|---|---|---|
| **Critical** | 1 (fixed) | sliceDeepCopy truncation — silent data loss |
| **High** | 2 (fixed) | Cross-validation expanded, precedence regression fixed |
| **Medium** | 2 (fixed) | Misleading test name, aliasing in mergeSlicesNative |
| **Low** | 2 | safeCap hint, pointer aliasing (both acceptable) |
| **Positive** | 7 | Sound architecture, thorough aliasing prevention, type handling |

The core merge implementation is well-engineered. All critical and high issues have been
fixed. Cross-validation coverage expanded from 4 to 9 equivalence tests.
2 changes: 2 additions & 0 deletions errors/errors.go
Original file line number Diff line number Diff line change
Expand Up @@ -318,6 +318,8 @@ var (
ErrFailedToInitializeAtmosConfig = errors.New("failed to initialize atmos config")
ErrInvalidListMergeStrategy = errors.New("invalid list merge strategy")
ErrMerge = errors.New("merge error")
ErrMergeNilDst = errors.New("merge destination must not be nil")
ErrMergeTypeMismatch = errors.New("cannot override two slices with different type")
ErrEncode = errors.New("encoding error")
ErrDecode = errors.New("decoding error")

Expand Down
2 changes: 2 additions & 0 deletions internal/exec/terraform_clean_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (
cfg "github.com/cloudposse/atmos/pkg/config"
"github.com/cloudposse/atmos/pkg/schema"
tfclean "github.com/cloudposse/atmos/pkg/terraform/clean"
"github.com/cloudposse/atmos/tests"
)

// verifyFileExists checks that all files in the list exist.
Expand Down Expand Up @@ -39,6 +40,7 @@ func verifyFileDeleted(t *testing.T, files []string) (bool, string) {
// correctly with ExecuteTerraform. This test must remain in internal/exec because it
// depends on ExecuteTerraform and other internal/exec functions.
func TestCLITerraformClean(t *testing.T) {
tests.RequireTerraform(t)
// Use t.Setenv for automatic cleanup and better test isolation.
t.Setenv("ATMOS_CLI_CONFIG_PATH", "")
t.Setenv("ATMOS_BASE_PATH", "")
Expand Down
59 changes: 59 additions & 0 deletions internal/exec/terraform_execute_exit_wrapping_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
package exec

// terraform_execute_exit_wrapping_test.go is a dedicated contract test for the
// ExitCodeError wrapping guarantee of ExecuteShellCommand.
//
// Contract: whenever the spawned subprocess exits with a non-zero code,
// ExecuteShellCommand must return an error that satisfies errors.As(err, ExitCodeError).
// This guarantee is relied upon by runWorkspaceSetup, executeMainTerraformCommand, and
// any other caller that needs to inspect the subprocess exit code.
//
// Cross-platform approach: uses the test binary (os.Executable) with
// _ATMOS_TEST_EXIT_ONE=1. TestMain in testmain_test.go intercepts that env var
// and calls os.Exit(1) immediately.

import (
"errors"
"os"
"testing"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"

errUtils "github.com/cloudposse/atmos/errors"
"github.com/cloudposse/atmos/pkg/schema"
)

// TestExecuteShellCommand_ExitOneWrappedAsExitCodeError verifies the core ExitCodeError
// wrapping contract: a subprocess that exits with code 1 must produce an error that can
// be unwrapped to errUtils.ExitCodeError with Code == 1.
//
// This is the canonical standalone reference test for this contract. The same property
// is exercised indirectly by workspace and pipeline tests that use _ATMOS_TEST_EXIT_ONE,
// but having an explicit dedicated test makes regressions immediately obvious in isolation.
func TestExecuteShellCommand_ExitOneWrappedAsExitCodeError(t *testing.T) {
exePath, err := os.Executable()
require.NoError(t, err, "os.Executable() must succeed")

atmosConfig := schema.AtmosConfiguration{}
execErr := ExecuteShellCommand(
atmosConfig,
exePath,
[]string{"-test.run=^$"}, // no test matches; TestMain exits before any test
"", // dir: current working directory
[]string{"_ATMOS_TEST_EXIT_ONE=1"}, // env: makes TestMain call os.Exit(1)
false, // dryRun: false — actually run the subprocess
"", // redirectStdErr
)

// The subprocess must have exited with code 1.
require.Error(t, execErr, "ExecuteShellCommand must return an error when subprocess exits 1")

// The error must be (or wrap) an ExitCodeError — callers depend on this contract.
var exitCodeErr errUtils.ExitCodeError
require.True(t,
errors.As(execErr, &exitCodeErr),
"error must satisfy errors.As(err, ExitCodeError); got %T: %v", execErr, execErr,
)
assert.Equal(t, 1, exitCodeErr.Code, "ExitCodeError.Code must equal the subprocess exit code (1)")
}
Loading
Loading