Skip to content

Harden build dir deletion against transient NFS failures [CI 1/9]#1595

Merged
auphelia merged 2 commits into
Xilinx:devfrom
merkelmarrow:1-build-dir-nfs-robustness-pr
Jun 4, 2026
Merged

Harden build dir deletion against transient NFS failures [CI 1/9]#1595
auphelia merged 2 commits into
Xilinx:devfrom
merkelmarrow:1-build-dir-nfs-robustness-pr

Conversation

@merkelmarrow

Copy link
Copy Markdown
Contributor

This is PR 1 of 9 of a series intended to make CI faster and more robust.

This patch adds robust_rmtree, a helper that replaces shutil.rmtree in several cleanup and test sites. robust_rmtree simply retries a build tree deletion if it fails due to ENOTEMPTY and EBUSY errors associated with Network File System storage. These errors are a race that can happen when a tool subprocess closes a file just prior to removing the tree and caused non-deterministic test errors in a small percentage of samples.

This change reflects the need to move our testing suite to NFS to make better use of distributed testing infrastructure.

make_build_dir() was also adjusted to stop leaving orphan directories in /tmp.

merkelmarrow and others added 2 commits June 3, 2026 12:05
make_build_dir created its unique directory under /tmp and then
string-replaced the prefix to FINN_BUILD_DIR, leaving a directory
behind on every call. Create it directly inside FINN_BUILD_DIR
then widen permissions so non-build UIDs can read the build tree.

Add robust_rmtree, which retries the transient ENOTEMPTY that NFS
raises when a tool subprocess closes a file just as the tree is
removed, and adopt it in tests and cleanup.py.

Signed-off-by: Marco Blackwell <mblackwe@amd.com>
@auphelia

auphelia commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Thank you, @merkelmarrow !

@auphelia auphelia merged commit 3c8315a into Xilinx:dev Jun 4, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants