You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ci: parallelize unit:pytest:core with pytest-xdist (#56)
Reintroduces `-n auto` for the core test suite after diagnosing and
fixing the root cause of the earlier xdist race.
Root cause
----------
`StackManager.__init__` and `StackManager.deploy` both invoke
self-healing Lambda-package builders (`_ensure_lambda_build` and
`_rebuild_lambda_packages`) which `rm -rf` and repopulate the real
`lambda/kubectl-applier-simple-build/` tree. Those are correct
behaviors for `gco stacks deploy`, but during tests they race with
CDK's `Code.from_asset()` on other xdist workers — one worker's
mid-`rm -rf` window intersects another worker's `copyDirectory`,
producing the sporadic `ENOENT ... lstat '...botocore/data/sagemaker'`
failures we saw before.
A rm+pip-install cycle on the real tree is ~25 s, and tracing showed
every `test_deploy_*` plus every `StackManager(config)` construction
without a `project_root` kwarg was triggering it. At four workers on
a 2-vCPU CI runner, that window intersects a CDK synth with
near-certainty.
Fix
---
A session-scoped autouse fixture in `tests/conftest.py`
(`_neutralize_lambda_build`) patches both methods to short-circuit
when `self.project_root` resolves to the real repo root. Tests that
intentionally exercise these methods against a `tmp_path` keep
working because the guard only skips real-root calls. The composite
action `.github/actions/build-lambda-package` handles the population
in CI; the existing `ensure_lambda_build_dirs` fixture covers
local-dev.
Bumped hypothesis per-example deadlines on the four analytics property
tests that run full-app CDK synths (`test_analytics_bucket_isolation_property`,
`test_analytics_cluster_shared_configmap_property`,
`test_analytics_configmap_property`, `test_analytics_roundtrip_property`)
to 20 s (and 10 s for the cheaper ones that were at 5 s). CDK synth
contention under xdist was pushing the first (uncached) example over
the old limit.
Verification
------------
Full suite with `-n auto --dist=load` on an 8-core local machine:
3980 passed, 1 skipped in 353s (first run)
3980 passed, 1 skipped in 320s (second run)
Both runs clean, no flakes.
Workflow change
---------------
`unit:pytest:core` now runs with `-n auto --dist=load --maxfail=1`.
`--maxfail=1` preserves the previous `-x` semantics (stop at first
failure); `-x` itself isn't compatible with xdist.
0 commit comments