Skip to content

Commit d297fa0

Browse files
authored
Merge branch 'main' into copilot/warn-when-vendor-archived-repo
2 parents 899013b + 01541b6 commit d297fa0

File tree

53 files changed

+4023
-147
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+4023
-147
lines changed

CLAUDE.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,18 @@ ALWAYS use `cmd.NewTestKit(t)` for cmd tests. Auto-cleans RootCmd state (flags,
207207
- No coverage theater
208208
- Remove always-skipped tests
209209
- Use `errors.Is()` for error checking
210+
- **For aliasing/isolation tests, verify BOTH directions:** after a merge, mutate the result and confirm the original inputs are unchanged (result→src isolation); also mutate a source map before the merge and confirm the result is unaffected (src→result isolation).
211+
- **For slice-result tests, assert element contents, not just length:** `require.Len` alone allows regressions that drop or corrupt contents. Assert at least the first and last element by value.
212+
- **Never use platform-specific binaries in tests** (e.g., `false`, `true`, `sh` on Unix): these don't exist on Windows. Use Go-native test helpers: subprocess via `os.Executable()` + `TestMain`, temp files with cross-platform scripts, or DI to inject a fake command runner.
213+
- **Safety guards must fail loudly:** any check that counts fixture files or validates test preconditions must use `require.Positive` (or equivalent) — never `if count > 0 { ... }` which silently disables the check when misconfigured.
214+
- **Use absolute paths for fixture counting:** any `filepath.WalkDir` or file-count assertion must use an already-resolved absolute path (not a relative one) to be CWD-independent.
215+
- **Add compile-time sentinels for schema field references in tests:** when a test uses a specific struct field (e.g., `schema.Provider{Kind: "azure"}`), add `var _ = schema.Provider{Kind: "azure"}` as a compile guard so a field rename immediately fails the build.
216+
- **Add prerequisite sub-tests for subprocess behavior:** when a test depends on implicit env propagation (e.g., `ComponentEnvList` reaching a subprocess), add an explicit sub-test that confirms the behavior before the main test runs.
217+
- **Contract vs. legacy behavior:** if a test says "matches mergo" (or any other library), add an opt-in cross-validation test behind a build tag (e.g., `//go:build compare_mergo`); otherwise state "defined contract" explicitly so it's clear the native implementation owns the behavior. Run cross-validation tests with: `go test -tags compare_mergo ./pkg/merge/... -run CompareMergo -v` (requires mergo v1.0.x installed).
218+
- **Include negative-path tests for recovery logic:** whenever a test verifies that a recovery/fallback triggers under condition X, add a corresponding test that verifies the recovery does NOT trigger when condition X is absent (e.g., mismatched workspace name).
219+
220+
### Follow-up Tracking (MANDATORY)
221+
When a PR defers work to a follow-up (e.g., migration, cleanup, refactor), **open a GitHub issue and link it by number** in the blog post, roadmap, and/or PR description before merging. Blog posts with "a follow-up issue will..." with no `#number` are incomplete — the work will never be tracked.
210222

211223
### Test-Only Helpers in Production Packages (MANDATORY)
212224
When a test utility (seed/reset/inject) must be accessible from tests in **multiple packages**
@@ -419,6 +431,27 @@ Search `internal/exec/` and `pkg/` before implementing. Extend, don't duplicate.
419431
### Cross-Platform (MANDATORY)
420432
Linux/macOS/Windows compatible. Use SDKs over binaries. Use `filepath.Join()` instead of hardcoded path separators.
421433

434+
**Subprocess helpers in tests (cross-platform):**
435+
Instead of `exec.LookPath("false")` or other Unix-only binaries, use the test binary itself.
436+
**Important:** If your package already has a `TestMain`, add the env-gate check **inside the existing `TestMain`** — do not add a second `TestMain` function (Go does not allow two in the same package).
437+
438+
```go
439+
// In testmain_test.go — merge this check into the existing TestMain:
440+
func TestMain(m *testing.M) {
441+
// If _ATMOS_TEST_EXIT_ONE is set, exit immediately with code 1.
442+
// This lets tests use the test binary itself as a cross-platform "exit 1" command.
443+
if os.Getenv("_ATMOS_TEST_EXIT_ONE") == "1" { os.Exit(1) }
444+
os.Exit(m.Run())
445+
}
446+
// NOTE: If your package already defines TestMain, insert the _ATMOS_TEST_EXIT_ONE
447+
// check at the top of the existing function rather than copying the whole snippet.
448+
449+
// In the test itself:
450+
exePath, _ := os.Executable()
451+
info.Command = exePath
452+
info.ComponentEnvList = []string{"_ATMOS_TEST_EXIT_ONE=1"}
453+
```
454+
422455
**Path handling in tests:**
423456
- **NEVER use forward slash concatenation** like `tempDir + "/components/terraform/vpc"`
424457
- **ALWAYS use `filepath.Join()`** with separate arguments: `filepath.Join(tempDir, "components", "terraform", "vpc")`
Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
# Deep-Merge Native & Terraform Workspace Fixes
2+
3+
**Date:** 2026-03-19 (updated 2026-03-23)
4+
**PR:** #2201 (perf: replace mergo with native deep merge)
5+
**Reviewer findings:** CodeRabbit audit + GitHub Advanced Security alerts + independent deep analysis
6+
7+
---
8+
9+
## What the PR Does
10+
11+
Replaces the pre-merge deep-copy loop (which called `mergo.Merge` after copying each input)
12+
with a native Go implementation that deep-copies only the first input and merges subsequent
13+
inputs in-place with leaf-level copying. This reduces N full `DeepCopyMap` calls to 1,
14+
achieving ~3.5× speedup on the ~118k+ merge calls per stack resolution run.
15+
16+
**This is the core of Atmos.** Every stack resolution passes through this code. Any bug
17+
here affects every single `atmos` command that processes stacks.
18+
19+
### Architecture Change
20+
21+
**Before:**
22+
```text
23+
for each input:
24+
copy = DeepCopyMap(input) // Full deep copy of every input
25+
mergo.Merge(result, copy) // mergo merge (uses reflection internally)
26+
```
27+
28+
**After:**
29+
```text
30+
result = DeepCopyMap(inputs[0]) // Deep copy only the first input
31+
for each remaining input:
32+
deepMergeNative(result, input) // Native merge with leaf-level copying (no reflection)
33+
```
34+
35+
---
36+
37+
## Native Merge Semantics
38+
39+
### Merge Rules (merge_native.go)
40+
41+
| Scenario | Behavior | Correct? |
42+
|---|---|---|
43+
| Both map | Recursive merge ||
44+
| Src map, dst not map | Src overrides dst | ✅ (matches mergo WithOverride) |
45+
| Src not map, dst map | Src overrides dst | ✅ (matches mergo WithOverride) |
46+
| Src slice, dst map | **Error**`ErrMergeTypeMismatch` | ⚠️ Asymmetric but intentional |
47+
| Src nil | Override dst with nil | ✅ (matches mergo WithOverride) |
48+
| Src typed map | Normalize to `map[string]any` via reflection ||
49+
| Src typed slice | Normalize to `[]any` via reflection ||
50+
51+
### Slice Merge Modes
52+
53+
| Mode | Behavior | Notes |
54+
|---|---|---|
55+
| Default | Src slice replaces dst slice | Standard |
56+
| `appendSlice` | Dst + src elements concatenated | Both deep-copied |
57+
| `sliceDeepCopy` | Element-wise merge, src extends result | Fixed (was truncating) |
58+
59+
### Type Handling
60+
61+
| Type | Deep Copy Method | Correct? |
62+
|---|---|---|
63+
| Primitives (string, int, float, bool) | Pass-through (immutable) ||
64+
| `map[string]any` | Recursive `deepCopyMap` ||
65+
| `[]any` | Recursive `deepCopySlice` ||
66+
| Typed maps (`map[string]string`) | Reflection-based iteration ||
67+
| Typed slices (`[]string`) | Reflection-based iteration ||
68+
| Pointers | Pass-through (**aliased**) | ⚠️ Safe for YAML data |
69+
| `nil` | Pass-through ||
70+
71+
---
72+
73+
## Issues Addressed
74+
75+
### 1. `sliceDeepCopy` truncation — silent data loss (CRITICAL, fixed)
76+
77+
**File:** `pkg/merge/merge_native.go`
78+
79+
**Problem:** When `sliceDeepCopy=true` and src had more elements than dst, extra src elements
80+
were **silently dropped**. This was a data loss bug for users with `list_merge_strategy: deep`
81+
whose overlay stacks add new list elements beyond the base.
82+
83+
**Example:** A base stack with 2 EKS node groups + an overlay adding a 3rd `gpu` group would
84+
silently lose the gpu group:
85+
86+
```yaml
87+
# base: 2 node groups
88+
node_groups:
89+
- name: general
90+
instance_type: m5.large
91+
- name: compute
92+
instance_type: c5.xlarge
93+
94+
# overlay: adds 3rd group
95+
node_groups:
96+
- name: general
97+
instance_type: m5.2xlarge
98+
- name: compute
99+
instance_type: c5.2xlarge
100+
- name: gpu # ← SILENTLY DROPPED
101+
instance_type: g5.xlarge
102+
```
103+
104+
**Fix:** `mergeSlicesNative` now uses `max(len(dst), len(src))` for result length. Extra src
105+
elements are deep-copied and appended, matching mergo's `WithSliceDeepCopy` behavior.
106+
107+
**Tests:** 3 existing tests updated from expecting truncation to expecting extension.
108+
5 new cross-validation tests added for `appendSlice` and `sliceDeepCopy` modes.
109+
110+
### 2. `sliceDeepCopy` vs `appendSlice` precedence flip (behavioral regression, fixed)
111+
112+
**File:** `pkg/merge/merge_native.go`
113+
114+
**Problem:** The new `deepMergeNative` checked `appendSlice` before `sliceDeepCopy`, but the
115+
old mergo code checked `WithSliceDeepCopy` first. When both flags are `true`, the old code
116+
applied element-wise merging, the new code appended.
117+
118+
**Fix:** Reordered: `if sliceDeepCopy { … } else { /* appendSlice */ }`.
119+
120+
### 3. `mergeSlicesNative` aliased dst maps and tail elements (fixed)
121+
122+
**File:** `pkg/merge/merge_native.go`
123+
124+
**Problem (inner maps):** Shallow copy of dstMap values into merged map caused silent
125+
corruption in multi-input merges.
126+
127+
**Fix:** `merged[k] = deepCopyValue(v)` for every dstMap value.
128+
129+
**Problem (tail elements):** `copy(result, dst)` shallow-copied tail positions, creating
130+
aliases that could corrupt the accumulator in subsequent merge passes.
131+
132+
**Fix:** Deep-copy tail positions explicitly.
133+
134+
### 4. Misleading test name (fixed)
135+
136+
**File:** `pkg/merge/merge_compare_mergo_test.go`
137+
138+
**Problem:** Case named `"nil value in src map entry is skipped"` but nil actually overrides.
139+
140+
**Fix:** Renamed to `"nil value in src map entry overrides dst"`.
141+
142+
### 5. Cross-validation test coverage too narrow (fixed)
143+
144+
**File:** `pkg/merge/merge_compare_mergo_test.go`
145+
146+
**Problem:** Only 4 equivalence cases tested against mergo. No coverage for `appendSlice`
147+
or `sliceDeepCopy` modes.
148+
149+
**Fix:** Added 5 new cross-validation tests:
150+
- `appendSlice concatenates slices`
151+
- `appendSlice with nested maps`
152+
- `sliceDeepCopy merges overlapping map elements`
153+
- `sliceDeepCopy src extends beyond dst length`
154+
- `sliceDeepCopy with three inputs extending progressively`
155+
156+
### 6. `isTerraformCurrentWorkspace` default workspace handling (fixed)
157+
158+
**File:** `internal/exec/terraform_utils.go`
159+
160+
**Problem:** Terraform never writes `.terraform/environment` for the `default` workspace.
161+
The helper always returned `false` when the file was absent, so workspace recovery never
162+
triggered for `default`.
163+
164+
**Fix:** Return `true` when file is missing AND workspace is `"default"`. Return `true`
165+
when file is empty AND workspace is `"default"`.
166+
167+
### 7. Workspace recovery log level too low (fixed)
168+
169+
**File:** `internal/exec/terraform_execute_helpers_exec.go`
170+
171+
**Fix:** Upgraded `log.Debug` to `log.Warn` for workspace recovery messages.
172+
173+
### 8. Integer overflow in size computations (fixed)
174+
175+
**File:** `pkg/merge/merge_native.go`
176+
177+
**Fix:** `safeCap(a, b)` clamps to `1<<24` (16M entries) to prevent OOM.
178+
179+
---
180+
181+
## Remaining Items
182+
183+
### Fixed
184+
185+
1. ~~**Document the mergo/native split**~~ ✅ — Added comments to all three remaining
186+
mergo call sites explaining why they still use mergo:
187+
- `pkg/merge/merge_yaml_functions.go:177` — YAML function slice merging has different
188+
semantics (operates on individual elements during `!include`/`!merge`, not full stacks).
189+
- `pkg/merge/merge_yaml_functions.go:265` — Cross-references the first comment.
190+
- `pkg/devcontainer/config_loader.go:350` — Devcontainer uses typed structs, not
191+
`map[string]any`. Not on the hot path.
192+
- All three have `TODO: migrate to native merge` markers.
193+
194+
### Future TODOs (post-merge)
195+
196+
2. **Run cross-validation in CI** — Add `compare_mergo` tests to a CI job. Currently
197+
behind `//go:build compare_mergo` build tag and only run manually.
198+
3. **Migrate `merge_yaml_functions.go` to native merge** — Eliminate the dual mergo/native
199+
split. Requires adapting YAML function slice semantics to the native merge API.
200+
4. **Migrate `devcontainer/config_loader.go` to native merge** — Lower priority since
201+
devcontainer config merging is not performance-critical and uses typed structs.
202+
5. **Add concurrent-contract test** — Document that `deepMergeNative` is not safe for
203+
concurrent use on the same dst (callers must synchronize).
204+
205+
### No Action Needed
206+
207+
5. `safeCap` max hint — unlikely to be hit in practice.
208+
6. Pointer aliasing — safe for YAML-parsed data.
209+
7. `TF_DATA_DIR` relative path — `componentPath` is correct (matches Terraform's CWD).
210+
8. Workspace recovery dual guard — correct and well-tested.
211+
212+
---
213+
214+
## Summary of Files Changed
215+
216+
| File | Change |
217+
|------|--------|
218+
| `pkg/merge/merge_native.go` | sliceDeepCopy extension fix; precedence fix; aliasing fixes |
219+
| `pkg/merge/merge_native_test.go` | 3 tests updated for extension; new precedence/aliasing tests |
220+
| `pkg/merge/merge_compare_mergo_test.go` | Fix test name; add 5 cross-validation tests |
221+
| `pkg/merge/merge.go` | Replace mergo pre-copy loop with native merge |
222+
| `internal/exec/terraform_utils.go` | `isTerraformCurrentWorkspace` with default handling |
223+
| `internal/exec/terraform_utils_test.go` | 11 sub-tests for workspace detection |
224+
| `internal/exec/terraform_execute_helpers_exec.go` | Workspace recovery with log.Warn |
225+
| `internal/exec/terraform_execute_helpers_pipeline_test.go` | Recovery path tests |
226+
| `internal/exec/terraform_execute_helpers_workspace_test.go` | Error propagation test |
227+
| `internal/exec/testmain_test.go` | Cross-platform subprocess helper |
228+
| `errors/errors.go` | `ErrMergeNilDst`, `ErrMergeTypeMismatch` sentinels |
229+
230+
## Audit Summary
231+
232+
| Category | Count | Key Items |
233+
|---|---|---|
234+
| **Critical** | 1 (fixed) | sliceDeepCopy truncation — silent data loss |
235+
| **High** | 2 (fixed) | Cross-validation expanded, precedence regression fixed |
236+
| **Medium** | 2 (fixed) | Misleading test name, aliasing in mergeSlicesNative |
237+
| **Low** | 2 | safeCap hint, pointer aliasing (both acceptable) |
238+
| **Positive** | 7 | Sound architecture, thorough aliasing prevention, type handling |
239+
240+
The core merge implementation is well-engineered. All critical and high issues have been
241+
fixed. Cross-validation coverage expanded from 4 to 9 equivalence tests.

errors/errors.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -318,6 +318,8 @@ var (
318318
ErrFailedToInitializeAtmosConfig = errors.New("failed to initialize atmos config")
319319
ErrInvalidListMergeStrategy = errors.New("invalid list merge strategy")
320320
ErrMerge = errors.New("merge error")
321+
ErrMergeNilDst = errors.New("merge destination must not be nil")
322+
ErrMergeTypeMismatch = errors.New("cannot override two slices with different type")
321323
ErrEncode = errors.New("encoding error")
322324
ErrDecode = errors.New("decoding error")
323325

internal/exec/describe_affected_optimizations_test.go

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -539,6 +539,23 @@ module "remote_module" {
539539
require.NoError(t, err)
540540
assert.Empty(t, patterns)
541541
})
542+
543+
t.Run("invalid HCL includes component name and location in error", func(t *testing.T) {
544+
badPath := filepath.Join(tempDir, "components", "terraform", "broken")
545+
err := os.MkdirAll(badPath, 0o755)
546+
require.NoError(t, err)
547+
548+
invalidHCL := `variable "name" { default = var.other }`
549+
err = os.WriteFile(filepath.Join(badPath, "main.tf"), []byte(invalidHCL), 0o644)
550+
require.NoError(t, err)
551+
552+
freshCache := newComponentPathPatternCache()
553+
_, err = freshCache.getTerraformModulePatterns("broken", atmosConfig)
554+
require.Error(t, err)
555+
assert.ErrorIs(t, err, errUtils.ErrFailedToLoadTerraformComponent)
556+
assert.Contains(t, err.Error(), "'broken'", "error should include the component name")
557+
assert.Contains(t, err.Error(), "main.tf:", "error should include the file name")
558+
})
542559
}
543560

544561
func TestComponentPathPatternCache_ModulePatternsThreadSafety(t *testing.T) {
@@ -2222,6 +2239,25 @@ module "security" {
22222239
require.NoError(t, err)
22232240
assert.True(t, changed)
22242241
})
2242+
2243+
t.Run("invalid HCL includes component name and location in error", func(t *testing.T) {
2244+
componentPath := filepath.Join(tempDir, "components", "terraform", "bad-hcl")
2245+
err := os.MkdirAll(componentPath, 0o755)
2246+
require.NoError(t, err)
2247+
2248+
// Write syntactically invalid HCL.
2249+
invalidHCL := `variable "name" { default = var.other }`
2250+
err = os.WriteFile(filepath.Join(componentPath, "main.tf"), []byte(invalidHCL), 0o644)
2251+
require.NoError(t, err)
2252+
2253+
changedFiles := []string{filepath.Join(componentPath, "main.tf")}
2254+
2255+
_, err = areTerraformComponentModulesChanged("bad-hcl", atmosConfig, changedFiles)
2256+
require.Error(t, err)
2257+
assert.ErrorIs(t, err, errUtils.ErrFailedToLoadTerraformComponent)
2258+
assert.Contains(t, err.Error(), "'bad-hcl'", "error should include the component name")
2259+
assert.Contains(t, err.Error(), "main.tf:", "error should include the file name")
2260+
})
22252261
}
22262262

22272263
func TestChangedFilesIndex_GetRelevantFiles_EdgeCases(t *testing.T) {

internal/exec/describe_affected_pattern_cache.go

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ func (c *componentPathPatternCache) getTerraformModulePatterns(
106106
c.cacheEmptyPatterns(component)
107107
return []string{}, nil
108108
}
109-
return nil, errors.Join(errUtils.ErrFailedToLoadTerraformComponent, diags.Err())
109+
return nil, componentLoadError(component, diags)
110110
}
111111

112112
if terraformConfiguration == nil {
@@ -126,6 +126,25 @@ func (c *componentPathPatternCache) getTerraformModulePatterns(
126126
return patterns, nil
127127
}
128128

129+
// diagFileLocation extracts the first file:line location from tfconfig diagnostics, or returns "".
130+
func diagFileLocation(diags tfconfig.Diagnostics) string {
131+
for _, diag := range diags {
132+
if diag.Pos != nil {
133+
return fmt.Sprintf("%s:%d", diag.Pos.Filename, diag.Pos.Line)
134+
}
135+
}
136+
return ""
137+
}
138+
139+
// componentLoadError builds an error for a failed component load, including file location when available.
140+
func componentLoadError(component string, diags tfconfig.Diagnostics) error {
141+
loc := diagFileLocation(diags)
142+
if loc != "" {
143+
return fmt.Errorf("%w '%s' at %s: %w", errUtils.ErrFailedToLoadTerraformComponent, component, loc, diags.Err())
144+
}
145+
return fmt.Errorf("%w '%s': %w", errUtils.ErrFailedToLoadTerraformComponent, component, diags.Err())
146+
}
147+
129148
// shouldCacheEmptyPatterns determines if the error indicates a missing directory that should cache empty patterns.
130149
func shouldCacheEmptyPatterns(diagErr error) bool {
131150
// Try structured error detection first (most robust).

internal/exec/describe_affected_utils_2.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -335,7 +335,7 @@ func areTerraformComponentModulesChanged(
335335
}
336336

337337
// For other errors (syntax errors, permission issues, etc.), return error.
338-
return false, errors.Join(errUtils.ErrFailedToLoadTerraformComponent, diagErr)
338+
return false, componentLoadError(component, diags)
339339
}
340340

341341
// If no configuration, there are no modules to check.

0 commit comments

Comments
 (0)