Skip to content

Isolate per-test scratch for parallel xdist runs [CI 5/9]#1599

Merged
auphelia merged 1 commit into
Xilinx:devfrom
merkelmarrow:5-parallel-test-isolation-pr
Jun 10, 2026
Merged

Isolate per-test scratch for parallel xdist runs [CI 5/9]#1599
auphelia merged 1 commit into
Xilinx:devfrom
merkelmarrow:5-parallel-test-isolation-pr

Conversation

@merkelmarrow

Copy link
Copy Markdown
Contributor

This is PR 5 of 9 in a series to make CI faster and more robust.

This patch makes concurrency hygiene improvements to tests, making them safe to run under pytest-xdist by giving each test and each parametrisation its own scratch path. Several tests wrote a fixed scratch filename and so collided on the same path when two workers ran them at once. A few others did not race but left a stray file in the repository root on failure.

Changes

Grouped by type:

  • tmp_path for tests that reused one fixed export name. test_batchnorm_to_affine_bnn_pynq (two functions shared one ONNX path), test_convert_to_hw_layers_{cnv,fc,synthetic}, test_fpgadataflow_lookup and test_fpgadataflow_softmax each reused one fixed export path across parametrisations or sibling tests. Each now writes under its own tmp_path, so no two workers share a path and pytest owns the cleanup.
  • tmp_path for cleanup hygiene. test_infer_datatypes_lfc, test_infer_data_layouts_cnv and test_sign_to_thres are single tests with unique filenames, so they did not race, but they wrote into the working directory and left the file behind on failure (the xfailing test_infer_data_layouts_cnv left it on every run). Moving them to tmp_path keeps the repo root clean and hands cleanup to pytest.
  • test_loop_rolling. The export went to a shared FINN_BUILD_DIR and, separately, LoopExtraction saves and reloads a fixed-name loop-body-template.onnx in the current working directory. Each case now gets its own make_build_dir and chdirs into it, so both the export and that cwd template are per test. The directory is removed with robust_rmtree on success and kept on failure.
  • test_npy2apintstream / test_npy2vectorstream. These already used a unique make_build_dir, but these specific tests sometimes hit ENOTEMPTY on NFS (see Harden build dir deletion against transient NFS failures [CI 1/9] #1595), so teardown now uses robust_rmtree. They also swap subprocess.Popen plus communicate (which discards the return code) for subprocess.check_call, so a failed compile or run fails at the point of failure.

Review notes / judgement calls

Intended pattern

Going forward, the intended pattern is as follows:

  1. Use pytest's tmp_path when you don't intend to keep any test artifacts, it keeps scratch on the local fs and let's pytest handle cleanup.
  2. Use make_build_dir with robust_rmtree if you intend to keep artifacts in FINN_BUILD_DIR on failure.
  3. Don't write tests that write or consume artifacts without isolating them.

Seeding change

This PR now seeds numpy/pytorch in the loop rolling test (while we're already changing it). A random seed would occasionally fail numerical tolerance in a small percentage of cases which is unhelpful for CI. Opinion on this is welcome.

Completeness

This PR addresses observed issues, but the problematic patterns/races addressed may silently exist elsewhere. Follow-on work needed, out of scope for this PR.

@auphelia auphelia left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @merkelmarrow ,

Thank you for the contribution!

One small thing: could you replace the use of tmp_path with the make_build_dir helper?

The main reason is consistency: we try to ensure that all build artifacts go through the standard FINN build directory setup (i.e., FINN_BUILD_DIR / FINN_HOST_BUILD_DIR). Since tests are often used as templates by users, using make_build_dir here helps avoid unintentionally introducing alternative directory patterns.

Thanks!

Many unit tests wrote a fixed scratch filename into the working
directory or a single shared build dir, which is unsafe under
pytest-xdist. The two batchnorm_to_affine tests shared one ONNX path,
and the convert_to_hw, lookup and softmax tests each reused one fixed
export path across their parametrisations or sibling tests, so
concurrent workers raced on that path. The infer and sign_to_thres
tests are single tests that wrote a fixed-name export into the
working directory, which is left behind there on failure.

Give each of these tests its own make_build_dir scratch directory so
concurrent workers never share a path while still using FINN's standard
FINN_BUILD_DIR setup. Remove those scratch directories with
robust_rmtree teardown because these cases do not intentionally keep
artefacts.

The loop-rolling tests exported under FINN_BUILD_DIR and separately
the LoopExtraction transform saves and reloads a fixed-name
loop-body-template.onnx in the working directory. Give each case its
own make_build_dir and chdir into it so both the export and the cwd
are per test, and seed torch and numpy so quantised configurations
stay within tolerance from run to run. Dir is removed with
robust_rmtree on success and kept on failure for inspection.

The npy2apintstream and npy2vectorstream util tests already used a
unique make_build_dir, but these specific tests were vulnerable
to NFS ENOTEMPTY teardown races, so remove the dir with
robust_rmtree on success. Swap subprocess.Popen plus communicate
(which never checked the return code) for subprocess.check_call,
so a failed compile/run raises at the point of failure.

Signed-off-by: Marco Blackwell <mblackwe@amd.com>
@merkelmarrow merkelmarrow force-pushed the 5-parallel-test-isolation-pr branch from 2622a71 to da4b2e1 Compare June 5, 2026 15:43
@merkelmarrow

Copy link
Copy Markdown
Contributor Author

Hi @merkelmarrow ,

Thank you for the contribution!

One small thing: could you replace the use of tmp_path with the make_build_dir helper?

The main reason is consistency: we try to ensure that all build artifacts go through the standard FINN build directory setup (i.e., FINN_BUILD_DIR / FINN_HOST_BUILD_DIR). Since tests are often used as templates by users, using make_build_dir here helps avoid unintentionally introducing alternative directory patterns.

Thanks!

Hi @auphelia,

No problem! That makes sense. I've updated all the tmp_path instances with a make_build_dir + robust_rmtree try/finally pattern. Some of the tests are pretty long so I put the body in a helper to avoid indenting everything.

Let me know if you would like any further changes.

@merkelmarrow merkelmarrow requested a review from auphelia June 5, 2026 16:07

@auphelia auphelia left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for addressing the issues you observed and also changing tmp_path with make_build_dir!
Also really nice to see the numpy/pytorch seeding added in the rolling test, I think that’s a good call for CI stability. The test is mainly about validating the rolling logic itself, so avoiding occasional numerical noise makes sense.
Looks good to me 👍

@auphelia auphelia merged commit 8347bf5 into Xilinx:dev Jun 10, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants