Skip to content

Conversation

@strickvl
Copy link
Contributor

@strickvl strickvl commented Dec 12, 2025

Summary

This PR adds parallel deletion support to the zenml artifact prune command, significantly improving performance when pruning large numbers of unused artifacts.

CLI Changes:

  • Adds --threads / -t option to control parallelism (default: 1 for backwards compatibility)
  • Uses ThreadPoolExecutor with bounded in-flight submissions to avoid memory issues
  • Thread-local Client instances ensure thread safety
  • Post-pass artifact cleanup checks server for remaining versions (more correct than relying on cached data)
  • Adds mutual exclusivity validation for --only-artifact and --only-metadata flags
  • Tracks failed deletions and provides accurate completion message

Client Changes:

  • Adds _skip_unused_check parameter to delete_artifact_version() and _delete_artifact_version()
  • This bypasses the redundant depaginate(list_artifact_versions, only_unused=True) check that was being called for every single deletion
  • Fixes pagination race condition when deleting in parallel (the total item count changes while pages are being fetched, causing "Invalid page X" errors)
  • Also improves single-threaded performance by avoiding N full pagination queries for N deletions

Test plan

  • Manual testing with zenml artifact prune --threads 15 on a server with 170 unused artifacts
  • Verify --only-artifact and --only-metadata together produces an error
  • Verify --ignore-errors mode reports failure count accurately
  • Verify single-threaded mode (default) still works as before

The `zenml artifact prune` command now supports a `--threads/-t` option
to delete unused artifact versions concurrently using a ThreadPoolExecutor.
This significantly improves performance when pruning many artifacts,
especially with cloud artifact stores where each deletion involves
network round-trips.

Key changes:
- Add --threads/-t option (default: 1 for backwards compatibility)
- Use thread-local Client instances for thread safety
- Implement bounded in-flight submission pattern to manage memory
- Show progress feedback during deletion
- Post-pass artifact cleanup to safely delete empty artifacts
- Update documentation with new option
- Add mutual exclusivity validation for --only-artifact/--only-metadata
- Track failed deletions and gate success message appropriately
- Add threads parameter to docstring
- Simplify has_versions check using Page.total attribute
@strickvl strickvl added internal To filter out internal PRs and issues x-squad Issues that are being handled by the x-squad snack snack-it labels Dec 12, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 12, 2025

Documentation Link Check Results

Absolute links check failed
There are broken absolute links in the documentation. See workflow logs for details
Relative links check passed
Last checked: 2025-12-12 15:37:43 UTC

strickvl and others added 3 commits December 12, 2025 12:39
The _delete_artifact_version method was calling depaginate() on every
deletion to verify the artifact is unused. With parallel deletions, this
causes pagination race conditions as the total item count changes while
pages are being fetched.

Add _skip_unused_check flag to bypass this redundant check when we've
already verified all versions are unused at the start of pruning.
- Update test_artifact_prune to use --threads 2 to exercise parallel deletion
- Add test_artifact_prune_mutually_exclusive_only_flags to verify that
  --only-artifact and --only-metadata cannot be used together
@strickvl strickvl added the release-notes Release notes will be attached and used publicly for this PR. label Dec 12, 2025
- Implement proper fail-fast for threaded deletion: stop scheduling new
  work and cancel queued futures when an error occurs (ignore_errors=False)
- Add cheap _assert_artifact_version_unused helper using single filtered
  query instead of expensive depaginate call
- Enforce unused check for artifact store deletion to prevent race
  conditions in --only-artifact mode
- Remove _skip_unused_check=True from CLI prune worker for defense-in-depth
- Update docs to include --yes flag and clarify fail-fast behavior
- Update comment in artifact prune to accurately reflect behavior:
  fail-fast aborts BEFORE artifact cleanup, not after
- Add project parameter passthrough to _assert_artifact_version_unused
  to ensure unused check is scoped correctly in multi-project scenarios
@strickvl strickvl marked this pull request as draft December 12, 2025 15:24
…ed tests

Refactored the parallel deletion logic in artifact prune command:
- Replace iterator + abort flag + nested breaks with deque-based approach
- Add _submit_until_full helper to fill in-flight work up to max threads
- Add _cancel_pending_futures helper for fail-fast cleanup
- Process all done futures before deciding to abort or refill
- Makes "no new work after failure" deterministic and explicit

Added new integration tests for fail-fast vs --ignore-errors:
- New fixture creates 8 unused artifact versions for reliable testing
- test_artifact_prune_fail_fast_threaded: validates early abort
- test_artifact_prune_ignore_errors_threaded: validates all attempts made
strickvl

This comment was marked as outdated.

strickvl and others added 5 commits December 12, 2025 16:46
- Add `from __future__ import annotations` to artifact.py for Python 3.10
  compatibility with subscripted Future and deque types
- Update _delete_artifact_version_target docstring to clarify that the
  unused re-check is delegated to Client.delete_artifact_version
- Make fixture deterministic by disabling caching with enable_cache=False
- Fix threaded tests to use lock-protected first-call failure pattern
  instead of relying on artifact ordering (prevents flaky tests)
- Add 5-second timeout to Event.wait() to prevent indefinite hangs
- Clarify _skip_unused_check docstring to warn about irreversible
  artifact-store deletion under concurrent operations
The module-level type alias `_FirstFailure = tuple[...]` is a runtime
assignment, not an annotation, so PEP 563 (`from __future__ import
annotations`) does not apply. Use `typing.Tuple` instead to ensure
compatibility with Python 3.9.
@strickvl
Copy link
Contributor Author

I'll reopen this later...

@strickvl strickvl closed this Dec 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal To filter out internal PRs and issues release-notes Release notes will be attached and used publicly for this PR. snack snack-it x-squad Issues that are being handled by the x-squad

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants