-
Notifications
You must be signed in to change notification settings - Fork 563
Add parallel deletion support to artifact prune CLI #4340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
+463
−37
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The `zenml artifact prune` command now supports a `--threads/-t` option to delete unused artifact versions concurrently using a ThreadPoolExecutor. This significantly improves performance when pruning many artifacts, especially with cloud artifact stores where each deletion involves network round-trips. Key changes: - Add --threads/-t option (default: 1 for backwards compatibility) - Use thread-local Client instances for thread safety - Implement bounded in-flight submission pattern to manage memory - Show progress feedback during deletion - Post-pass artifact cleanup to safely delete empty artifacts - Update documentation with new option
- Add mutual exclusivity validation for --only-artifact/--only-metadata - Track failed deletions and gate success message appropriately - Add threads parameter to docstring - Simplify has_versions check using Page.total attribute
Contributor
Documentation Link Check Results❌ Absolute links check failed |
The _delete_artifact_version method was calling depaginate() on every deletion to verify the artifact is unused. With parallel deletions, this causes pagination race conditions as the total item count changes while pages are being fetched. Add _skip_unused_check flag to bypass this redundant check when we've already verified all versions are unused at the start of pruning.
- Update test_artifact_prune to use --threads 2 to exercise parallel deletion - Add test_artifact_prune_mutually_exclusive_only_flags to verify that --only-artifact and --only-metadata cannot be used together
- Implement proper fail-fast for threaded deletion: stop scheduling new work and cancel queued futures when an error occurs (ignore_errors=False) - Add cheap _assert_artifact_version_unused helper using single filtered query instead of expensive depaginate call - Enforce unused check for artifact store deletion to prevent race conditions in --only-artifact mode - Remove _skip_unused_check=True from CLI prune worker for defense-in-depth - Update docs to include --yes flag and clarify fail-fast behavior
- Update comment in artifact prune to accurately reflect behavior: fail-fast aborts BEFORE artifact cleanup, not after - Add project parameter passthrough to _assert_artifact_version_unused to ensure unused check is scoped correctly in multi-project scenarios
…ed tests Refactored the parallel deletion logic in artifact prune command: - Replace iterator + abort flag + nested breaks with deque-based approach - Add _submit_until_full helper to fill in-flight work up to max threads - Add _cancel_pending_futures helper for fail-fast cleanup - Process all done futures before deciding to abort or refill - Makes "no new work after failure" deterministic and explicit Added new integration tests for fail-fast vs --ignore-errors: - New fixture creates 8 unused artifact versions for reliable testing - test_artifact_prune_fail_fast_threaded: validates early abort - test_artifact_prune_ignore_errors_threaded: validates all attempts made
strickvl
commented
Dec 12, 2025
strickvl
commented
Dec 12, 2025
- Add `from __future__ import annotations` to artifact.py for Python 3.10 compatibility with subscripted Future and deque types - Update _delete_artifact_version_target docstring to clarify that the unused re-check is delegated to Client.delete_artifact_version - Make fixture deterministic by disabling caching with enable_cache=False - Fix threaded tests to use lock-protected first-call failure pattern instead of relying on artifact ordering (prevents flaky tests) - Add 5-second timeout to Event.wait() to prevent indefinite hangs - Clarify _skip_unused_check docstring to warn about irreversible artifact-store deletion under concurrent operations
The module-level type alias `_FirstFailure = tuple[...]` is a runtime assignment, not an annotation, so PEP 563 (`from __future__ import annotations`) does not apply. Use `typing.Tuple` instead to ensure compatibility with Python 3.9.
Contributor
Author
|
I'll reopen this later... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
internal
To filter out internal PRs and issues
release-notes
Release notes will be attached and used publicly for this PR.
snack
snack-it
x-squad
Issues that are being handled by the x-squad
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds parallel deletion support to the
zenml artifact prunecommand, significantly improving performance when pruning large numbers of unused artifacts.CLI Changes:
--threads/-toption to control parallelism (default: 1 for backwards compatibility)ThreadPoolExecutorwith bounded in-flight submissions to avoid memory issuesClientinstances ensure thread safety--only-artifactand--only-metadataflagsClient Changes:
_skip_unused_checkparameter todelete_artifact_version()and_delete_artifact_version()depaginate(list_artifact_versions, only_unused=True)check that was being called for every single deletionTest plan
zenml artifact prune --threads 15on a server with 170 unused artifacts--only-artifactand--only-metadatatogether produces an error--ignore-errorsmode reports failure count accurately