|
| 1 | +# Deep-Merge Native & Terraform Workspace Fixes |
| 2 | + |
| 3 | +**Date:** 2026-03-19 (updated 2026-03-23) |
| 4 | +**PR:** #2201 (perf: replace mergo with native deep merge) |
| 5 | +**Reviewer findings:** CodeRabbit audit + GitHub Advanced Security alerts + independent deep analysis |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## What the PR Does |
| 10 | + |
| 11 | +Replaces the pre-merge deep-copy loop (which called `mergo.Merge` after copying each input) |
| 12 | +with a native Go implementation that deep-copies only the first input and merges subsequent |
| 13 | +inputs in-place with leaf-level copying. This reduces N full `DeepCopyMap` calls to 1, |
| 14 | +achieving ~3.5× speedup on the ~118k+ merge calls per stack resolution run. |
| 15 | + |
| 16 | +**This is the core of Atmos.** Every stack resolution passes through this code. Any bug |
| 17 | +here affects every single `atmos` command that processes stacks. |
| 18 | + |
| 19 | +### Architecture Change |
| 20 | + |
| 21 | +**Before:** |
| 22 | +```text |
| 23 | +for each input: |
| 24 | + copy = DeepCopyMap(input) // Full deep copy of every input |
| 25 | + mergo.Merge(result, copy) // mergo merge (uses reflection internally) |
| 26 | +``` |
| 27 | + |
| 28 | +**After:** |
| 29 | +```text |
| 30 | +result = DeepCopyMap(inputs[0]) // Deep copy only the first input |
| 31 | +for each remaining input: |
| 32 | + deepMergeNative(result, input) // Native merge with leaf-level copying (no reflection) |
| 33 | +``` |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +## Native Merge Semantics |
| 38 | + |
| 39 | +### Merge Rules (merge_native.go) |
| 40 | + |
| 41 | +| Scenario | Behavior | Correct? | |
| 42 | +|---|---|---| |
| 43 | +| Both map | Recursive merge | ✅ | |
| 44 | +| Src map, dst not map | Src overrides dst | ✅ (matches mergo WithOverride) | |
| 45 | +| Src not map, dst map | Src overrides dst | ✅ (matches mergo WithOverride) | |
| 46 | +| Src slice, dst map | **Error** — `ErrMergeTypeMismatch` | ⚠️ Asymmetric but intentional | |
| 47 | +| Src nil | Override dst with nil | ✅ (matches mergo WithOverride) | |
| 48 | +| Src typed map | Normalize to `map[string]any` via reflection | ✅ | |
| 49 | +| Src typed slice | Normalize to `[]any` via reflection | ✅ | |
| 50 | + |
| 51 | +### Slice Merge Modes |
| 52 | + |
| 53 | +| Mode | Behavior | Notes | |
| 54 | +|---|---|---| |
| 55 | +| Default | Src slice replaces dst slice | Standard | |
| 56 | +| `appendSlice` | Dst + src elements concatenated | Both deep-copied | |
| 57 | +| `sliceDeepCopy` | Element-wise merge, src extends result | Fixed (was truncating) | |
| 58 | + |
| 59 | +### Type Handling |
| 60 | + |
| 61 | +| Type | Deep Copy Method | Correct? | |
| 62 | +|---|---|---| |
| 63 | +| Primitives (string, int, float, bool) | Pass-through (immutable) | ✅ | |
| 64 | +| `map[string]any` | Recursive `deepCopyMap` | ✅ | |
| 65 | +| `[]any` | Recursive `deepCopySlice` | ✅ | |
| 66 | +| Typed maps (`map[string]string`) | Reflection-based iteration | ✅ | |
| 67 | +| Typed slices (`[]string`) | Reflection-based iteration | ✅ | |
| 68 | +| Pointers | Pass-through (**aliased**) | ⚠️ Safe for YAML data | |
| 69 | +| `nil` | Pass-through | ✅ | |
| 70 | + |
| 71 | +--- |
| 72 | + |
| 73 | +## Issues Addressed |
| 74 | + |
| 75 | +### 1. `sliceDeepCopy` truncation — silent data loss (CRITICAL, fixed) |
| 76 | + |
| 77 | +**File:** `pkg/merge/merge_native.go` |
| 78 | + |
| 79 | +**Problem:** When `sliceDeepCopy=true` and src had more elements than dst, extra src elements |
| 80 | +were **silently dropped**. This was a data loss bug for users with `list_merge_strategy: deep` |
| 81 | +whose overlay stacks add new list elements beyond the base. |
| 82 | + |
| 83 | +**Example:** A base stack with 2 EKS node groups + an overlay adding a 3rd `gpu` group would |
| 84 | +silently lose the gpu group: |
| 85 | + |
| 86 | +```yaml |
| 87 | +# base: 2 node groups |
| 88 | +node_groups: |
| 89 | + - name: general |
| 90 | + instance_type: m5.large |
| 91 | + - name: compute |
| 92 | + instance_type: c5.xlarge |
| 93 | + |
| 94 | +# overlay: adds 3rd group |
| 95 | +node_groups: |
| 96 | + - name: general |
| 97 | + instance_type: m5.2xlarge |
| 98 | + - name: compute |
| 99 | + instance_type: c5.2xlarge |
| 100 | + - name: gpu # ← SILENTLY DROPPED |
| 101 | + instance_type: g5.xlarge |
| 102 | +``` |
| 103 | +
|
| 104 | +**Fix:** `mergeSlicesNative` now uses `max(len(dst), len(src))` for result length. Extra src |
| 105 | +elements are deep-copied and appended, matching mergo's `WithSliceDeepCopy` behavior. |
| 106 | + |
| 107 | +**Tests:** 3 existing tests updated from expecting truncation to expecting extension. |
| 108 | +5 new cross-validation tests added for `appendSlice` and `sliceDeepCopy` modes. |
| 109 | + |
| 110 | +### 2. `sliceDeepCopy` vs `appendSlice` precedence flip (behavioral regression, fixed) |
| 111 | + |
| 112 | +**File:** `pkg/merge/merge_native.go` |
| 113 | + |
| 114 | +**Problem:** The new `deepMergeNative` checked `appendSlice` before `sliceDeepCopy`, but the |
| 115 | +old mergo code checked `WithSliceDeepCopy` first. When both flags are `true`, the old code |
| 116 | +applied element-wise merging, the new code appended. |
| 117 | + |
| 118 | +**Fix:** Reordered: `if sliceDeepCopy { … } else { /* appendSlice */ }`. |
| 119 | + |
| 120 | +### 3. `mergeSlicesNative` aliased dst maps and tail elements (fixed) |
| 121 | + |
| 122 | +**File:** `pkg/merge/merge_native.go` |
| 123 | + |
| 124 | +**Problem (inner maps):** Shallow copy of dstMap values into merged map caused silent |
| 125 | +corruption in multi-input merges. |
| 126 | + |
| 127 | +**Fix:** `merged[k] = deepCopyValue(v)` for every dstMap value. |
| 128 | + |
| 129 | +**Problem (tail elements):** `copy(result, dst)` shallow-copied tail positions, creating |
| 130 | +aliases that could corrupt the accumulator in subsequent merge passes. |
| 131 | + |
| 132 | +**Fix:** Deep-copy tail positions explicitly. |
| 133 | + |
| 134 | +### 4. Misleading test name (fixed) |
| 135 | + |
| 136 | +**File:** `pkg/merge/merge_compare_mergo_test.go` |
| 137 | + |
| 138 | +**Problem:** Case named `"nil value in src map entry is skipped"` but nil actually overrides. |
| 139 | + |
| 140 | +**Fix:** Renamed to `"nil value in src map entry overrides dst"`. |
| 141 | + |
| 142 | +### 5. Cross-validation test coverage too narrow (fixed) |
| 143 | + |
| 144 | +**File:** `pkg/merge/merge_compare_mergo_test.go` |
| 145 | + |
| 146 | +**Problem:** Only 4 equivalence cases tested against mergo. No coverage for `appendSlice` |
| 147 | +or `sliceDeepCopy` modes. |
| 148 | + |
| 149 | +**Fix:** Added 5 new cross-validation tests: |
| 150 | +- `appendSlice concatenates slices` |
| 151 | +- `appendSlice with nested maps` |
| 152 | +- `sliceDeepCopy merges overlapping map elements` |
| 153 | +- `sliceDeepCopy src extends beyond dst length` |
| 154 | +- `sliceDeepCopy with three inputs extending progressively` |
| 155 | + |
| 156 | +### 6. `isTerraformCurrentWorkspace` default workspace handling (fixed) |
| 157 | + |
| 158 | +**File:** `internal/exec/terraform_utils.go` |
| 159 | + |
| 160 | +**Problem:** Terraform never writes `.terraform/environment` for the `default` workspace. |
| 161 | +The helper always returned `false` when the file was absent, so workspace recovery never |
| 162 | +triggered for `default`. |
| 163 | + |
| 164 | +**Fix:** Return `true` when file is missing AND workspace is `"default"`. Return `true` |
| 165 | +when file is empty AND workspace is `"default"`. |
| 166 | + |
| 167 | +### 7. Workspace recovery log level too low (fixed) |
| 168 | + |
| 169 | +**File:** `internal/exec/terraform_execute_helpers_exec.go` |
| 170 | + |
| 171 | +**Fix:** Upgraded `log.Debug` to `log.Warn` for workspace recovery messages. |
| 172 | + |
| 173 | +### 8. Integer overflow in size computations (fixed) |
| 174 | + |
| 175 | +**File:** `pkg/merge/merge_native.go` |
| 176 | + |
| 177 | +**Fix:** `safeCap(a, b)` clamps to `1<<24` (16M entries) to prevent OOM. |
| 178 | + |
| 179 | +--- |
| 180 | + |
| 181 | +## Remaining Items |
| 182 | + |
| 183 | +### Fixed |
| 184 | + |
| 185 | +1. ~~**Document the mergo/native split**~~ ✅ — Added comments to all three remaining |
| 186 | + mergo call sites explaining why they still use mergo: |
| 187 | + - `pkg/merge/merge_yaml_functions.go:177` — YAML function slice merging has different |
| 188 | + semantics (operates on individual elements during `!include`/`!merge`, not full stacks). |
| 189 | + - `pkg/merge/merge_yaml_functions.go:265` — Cross-references the first comment. |
| 190 | + - `pkg/devcontainer/config_loader.go:350` — Devcontainer uses typed structs, not |
| 191 | + `map[string]any`. Not on the hot path. |
| 192 | + - All three have `TODO: migrate to native merge` markers. |
| 193 | + |
| 194 | +### Future TODOs (post-merge) |
| 195 | + |
| 196 | +2. **Run cross-validation in CI** — Add `compare_mergo` tests to a CI job. Currently |
| 197 | + behind `//go:build compare_mergo` build tag and only run manually. |
| 198 | +3. **Migrate `merge_yaml_functions.go` to native merge** — Eliminate the dual mergo/native |
| 199 | + split. Requires adapting YAML function slice semantics to the native merge API. |
| 200 | +4. **Migrate `devcontainer/config_loader.go` to native merge** — Lower priority since |
| 201 | + devcontainer config merging is not performance-critical and uses typed structs. |
| 202 | +5. **Add concurrent-contract test** — Document that `deepMergeNative` is not safe for |
| 203 | + concurrent use on the same dst (callers must synchronize). |
| 204 | + |
| 205 | +### No Action Needed |
| 206 | + |
| 207 | +5. `safeCap` max hint — unlikely to be hit in practice. |
| 208 | +6. Pointer aliasing — safe for YAML-parsed data. |
| 209 | +7. `TF_DATA_DIR` relative path — `componentPath` is correct (matches Terraform's CWD). |
| 210 | +8. Workspace recovery dual guard — correct and well-tested. |
| 211 | + |
| 212 | +--- |
| 213 | + |
| 214 | +## Summary of Files Changed |
| 215 | + |
| 216 | +| File | Change | |
| 217 | +|------|--------| |
| 218 | +| `pkg/merge/merge_native.go` | sliceDeepCopy extension fix; precedence fix; aliasing fixes | |
| 219 | +| `pkg/merge/merge_native_test.go` | 3 tests updated for extension; new precedence/aliasing tests | |
| 220 | +| `pkg/merge/merge_compare_mergo_test.go` | Fix test name; add 5 cross-validation tests | |
| 221 | +| `pkg/merge/merge.go` | Replace mergo pre-copy loop with native merge | |
| 222 | +| `internal/exec/terraform_utils.go` | `isTerraformCurrentWorkspace` with default handling | |
| 223 | +| `internal/exec/terraform_utils_test.go` | 11 sub-tests for workspace detection | |
| 224 | +| `internal/exec/terraform_execute_helpers_exec.go` | Workspace recovery with log.Warn | |
| 225 | +| `internal/exec/terraform_execute_helpers_pipeline_test.go` | Recovery path tests | |
| 226 | +| `internal/exec/terraform_execute_helpers_workspace_test.go` | Error propagation test | |
| 227 | +| `internal/exec/testmain_test.go` | Cross-platform subprocess helper | |
| 228 | +| `errors/errors.go` | `ErrMergeNilDst`, `ErrMergeTypeMismatch` sentinels | |
| 229 | + |
| 230 | +## Audit Summary |
| 231 | + |
| 232 | +| Category | Count | Key Items | |
| 233 | +|---|---|---| |
| 234 | +| **Critical** | 1 (fixed) | sliceDeepCopy truncation — silent data loss | |
| 235 | +| **High** | 2 (fixed) | Cross-validation expanded, precedence regression fixed | |
| 236 | +| **Medium** | 2 (fixed) | Misleading test name, aliasing in mergeSlicesNative | |
| 237 | +| **Low** | 2 | safeCap hint, pointer aliasing (both acceptable) | |
| 238 | +| **Positive** | 7 | Sound architecture, thorough aliasing prevention, type handling | |
| 239 | + |
| 240 | +The core merge implementation is well-engineered. All critical and high issues have been |
| 241 | +fixed. Cross-validation coverage expanded from 4 to 9 equivalence tests. |
0 commit comments