ci: parallelize unit:pytest:core with pytest-xdist (#56)

Jmevorach · web-flow · commit cbc15a0be75e · 2026-05-12T20:16:24.000-04:00
Reintroduces `-n auto` for the core test suite after diagnosing and
fixing the root cause of the earlier xdist race.

Root cause
----------
`StackManager.__init__` and `StackManager.deploy` both invoke
self-healing Lambda-package builders (`_ensure_lambda_build` and
`_rebuild_lambda_packages`) which `rm -rf` and repopulate the real
`lambda/kubectl-applier-simple-build/` tree. Those are correct
behaviors for `gco stacks deploy`, but during tests they race with
CDK's `Code.from_asset()` on other xdist workers — one worker's
mid-`rm -rf` window intersects another worker's `copyDirectory`,
producing the sporadic `ENOENT ... lstat '...botocore/data/sagemaker'`
failures we saw before.

A rm+pip-install cycle on the real tree is ~25 s, and tracing showed
every `test_deploy_*` plus every `StackManager(config)` construction
without a `project_root` kwarg was triggering it. At four workers on
a 2-vCPU CI runner, that window intersects a CDK synth with
near-certainty.

Fix
---
A session-scoped autouse fixture in `tests/conftest.py`
(`_neutralize_lambda_build`) patches both methods to short-circuit
when `self.project_root` resolves to the real repo root. Tests that
intentionally exercise these methods against a `tmp_path` keep
working because the guard only skips real-root calls. The composite
action `.github/actions/build-lambda-package` handles the population
in CI; the existing `ensure_lambda_build_dirs` fixture covers
local-dev.

Bumped hypothesis per-example deadlines on the four analytics property
tests that run full-app CDK synths (`test_analytics_bucket_isolation_property`,
`test_analytics_cluster_shared_configmap_property`,
`test_analytics_configmap_property`, `test_analytics_roundtrip_property`)
to 20 s (and 10 s for the cheaper ones that were at 5 s). CDK synth
contention under xdist was pushing the first (uncached) example over
the old limit.

Verification
------------
Full suite with `-n auto --dist=load` on an 8-core local machine:

    3980 passed, 1 skipped in 353s (first run)
    3980 passed, 1 skipped in 320s (second run)

Both runs clean, no flakes.

Workflow change
---------------
`unit:pytest:core` now runs with `-n auto --dist=load --maxfail=1`.
`--maxfail=1` preserves the previous `-x` semantics (stop at first
failure); `-x` itself isn't compatible with xdist.
diff --git a/.github/workflows/unit-tests.yml b/.github/workflows/unit-tests.yml
@@ -91,18 +91,20 @@ jobs:
           pip install -e ".[dev,mcp]"
       - uses: ./.github/actions/build-lambda-package
       - name: Run pytest with coverage
-        # Parallelism (pytest-xdist -n auto) was attempted here but the
-        # CDK-heavy stack tests (test_regional_stack, test_stacks, etc.)
-        # race on CDK's in-process asset-staging cache: two workers both
-        # stage ``lambda/kubectl-applier-simple-build`` into the shared
-        # ``cdk.out/asset.<hash>/`` destination and one hits ENOENT on the
-        # source mid-copy. Neither ``--dist=loadfile`` nor ``--dist=loadscope``
-        # fixes it because the race is cross-file (multiple test modules
-        # instantiate GCORegionalStack, which uses the same asset). Running
-        # serially with ``-x`` preserves correctness at the cost of the
-        # xdist wall-clock speedup. The two dedicated CDK jobs
-        # (unit:cdk:config-matrix + unit:cdk:nag-compliance) do benefit
-        # from parallelism and are wired for it in their own workflows.
+        # -n auto distributes tests across all available CPU cores via
+        # pytest-xdist. Every test is xdist-safe because
+        # tests/conftest.py::_neutralize_lambda_build patches
+        # StackManager._ensure_lambda_build and _rebuild_lambda_packages
+        # so tests can't rebuild the real lambda/kubectl-applier-simple-build
+        # tree mid-run — that rebuild is what CDK's Code.from_asset() races
+        # against when two workers synthesize stacks concurrently. The
+        # session-wide patch guards on `project_root` so the handful of
+        # tests that legitimately exercise these methods against a
+        # `tmp_path` keep working.
+        #
+        # --dist=load (xdist's default) round-robins individual test items
+        # across workers. --maxfail=1 matches the previous -x "stop at
+        # first failure" behavior — `-x` itself isn't compatible with xdist.
         run: |
           pytest tests/ -v \
             --ignore=tests/test_integration.py \
@@ -112,7 +114,8 @@ jobs:
             --cov-report=xml --cov-report=html --cov-report=json \
             --cov-report=term-missing \
             --cov-fail-under=90 \
-            --junitxml=report.xml -x
+            --junitxml=report.xml \
+            -n auto --maxfail=1
       - name: Upload coverage artifacts
         if: always()
         uses: actions/upload-artifact@v7
diff --git a/tests/conftest.py b/tests/conftest.py
@@ -80,6 +80,72 @@ def ensure_lambda_build_dirs():
             shutil.rmtree(pycache)
 
 
+# ============================================================================
+# Session-scoped: neutralize StackManager's self-healing Lambda rebuild during tests
+# ============================================================================
+#
+# ``StackManager.__init__`` calls ``_ensure_lambda_build()`` (and its downstream
+# ``_build_kubectl_lambda``) as a self-healing step so any ``gco stacks
+# deploy`` succeeds even when a contributor's build tree is stale. That's the
+# right behavior at runtime, but it's destructive during tests:
+#
+#   1. ``_build_kubectl_lambda`` does ``_safe_rmtree(build_dir)`` on the *real*
+#      ``lambda/kubectl-applier-simple-build/`` whenever its guard (``yaml/``
+#      missing) trips.
+#   2. Under pytest-xdist, one worker's rebuild races with another worker's
+#      CDK ``Code.from_asset()`` mid-copy, producing the sporadic
+#      ``ENOENT: … lstat '…lambda/kubectl-applier-simple-build/botocore/data/…``
+#      failures we see on the 2-vCPU CI runner.
+#   3. Any test that mocks ``subprocess.run`` while constructing a
+#      ``StackManager`` can silently short-circuit the pip-install step and
+#      leave the build tree partially populated, which then trips the guard
+#      on the NEXT construction and cascades a rebuild.
+#   4. ``deploy()`` calls ``_rebuild_lambda_packages()`` which rm-trees and
+#      pip-installs into the real build dir even when ``_run_cdk`` is
+#      mocked — so every ``test_deploy_*`` hits the real filesystem too.
+#
+# Tests should never rebuild the *real* Lambda tree. The composite action
+# (``.github/actions/build-lambda-package``) populates it before pytest runs
+# in CI, and ``ensure_lambda_build_dirs`` above handles the local-dev case.
+# Patching ``_ensure_lambda_build`` and ``_rebuild_lambda_packages`` to skip
+# when ``project_root`` points at the real repo makes xdist safe; tests that
+# intentionally exercise these methods against a ``tmp_path`` keep working
+# because the guard lets them through.
+@pytest.fixture(scope="session", autouse=True)
+def _neutralize_lambda_build(ensure_lambda_build_dirs):  # noqa: ARG001 — dep order only
+    from cli import stacks as _stacks
+
+    real_root = PROJECT_ROOT.resolve()
+    orig_ensure = _stacks.StackManager._ensure_lambda_build
+    orig_rebuild = _stacks.StackManager._rebuild_lambda_packages
+
+    def _guarded_ensure(self):
+        try:
+            same = Path(self.project_root).resolve() == real_root
+        except OSError:
+            same = False
+        if same:
+            return
+        return orig_ensure(self)
+
+    def _guarded_rebuild(self):
+        try:
+            same = Path(self.project_root).resolve() == real_root
+        except OSError:
+            same = False
+        if same:
+            return
+        return orig_rebuild(self)
+
+    _stacks.StackManager._ensure_lambda_build = _guarded_ensure
+    _stacks.StackManager._rebuild_lambda_packages = _guarded_rebuild
+    try:
+        yield
+    finally:
+        _stacks.StackManager._ensure_lambda_build = orig_ensure
+        _stacks.StackManager._rebuild_lambda_packages = orig_rebuild
+
+
 # ============================================================================
 # Model Fixtures
 # ============================================================================
diff --git a/tests/test_analytics_bucket_isolation_property.py b/tests/test_analytics_bucket_isolation_property.py
@@ -35,7 +35,9 @@
 combinations)`` strategy space is small enough that
 :func:`functools.cache` on
 ``(enabled, hyperpod, tuple(sorted(regions)))`` keeps the hot loop
-under the ``deadline=10000`` ms per-example budget.
+under the ``deadline=20000`` ms per-example budget, which also leaves
+headroom for the first (uncached) synth in each worker when the suite
+runs under pytest-xdist contention.
 ``max_examples=50`` with caching completes in under 90 s on the
 benchmark workstation.
 """
@@ -297,7 +299,7 @@ def setup_class(cls) -> None:
 
     @settings(
         max_examples=50,
-        deadline=10000,
+        deadline=20000,
         suppress_health_check=[
             HealthCheck.too_slow,
             HealthCheck.function_scoped_fixture,
diff --git a/tests/test_analytics_cluster_shared_configmap_property.py b/tests/test_analytics_cluster_shared_configmap_property.py
@@ -257,7 +257,7 @@ def setup_class(cls) -> None:
 
     @settings(
         max_examples=50,
-        deadline=5000,
+        deadline=10000,
         suppress_health_check=[
             HealthCheck.too_slow,
             HealthCheck.function_scoped_fixture,
@@ -335,7 +335,7 @@ class TestComputeKubectlClusterSharedReplacementsRoundTrip:
 
     @settings(
         max_examples=50,
-        deadline=5000,
+        deadline=10000,
         suppress_health_check=[
             HealthCheck.too_slow,
             HealthCheck.function_scoped_fixture,
diff --git a/tests/test_analytics_configmap_property.py b/tests/test_analytics_configmap_property.py
@@ -29,7 +29,7 @@
 
 ## Runtime budget
 
-``max_examples=20, deadline=10000`` keeps the test under ~2 min even
+``max_examples=20, deadline=20000`` keeps the test under ~2 min even
 without caching. With :func:`functools.cache` keyed on ``enabled``
 (cardinality 2) the hot loop reuses one cached synth per toggle value
 and completes in ~15 s.
@@ -203,7 +203,7 @@ def setup_class(cls) -> None:
 
     @settings(
         max_examples=20,
-        deadline=10000,
+        deadline=20000,
         suppress_health_check=[
             HealthCheck.too_slow,
             HealthCheck.function_scoped_fixture,
diff --git a/tests/test_analytics_roundtrip_property.py b/tests/test_analytics_roundtrip_property.py
@@ -98,7 +98,7 @@ def setup_class(cls) -> None:
 
     @settings(
         max_examples=4,
-        deadline=10000,
+        deadline=20000,
         suppress_health_check=[
             HealthCheck.too_slow,
             HealthCheck.function_scoped_fixture,