-
Notifications
You must be signed in to change notification settings - Fork 152
Description
Summary
Currently, when filtering configuration is changed to exclude certain files (e.g., using exclude_platform
to filter out old Python versions), bandersnatch does not automatically delete previously mirrored files that are now excluded by the active filters. This results in wasted storage space and creates a mismatch between user expectations and actual mirror contents.
Problem
When users update their filtering configuration to be more restrictive (for example, excluding packages for Python 2.7 or old platform-specific wheels), they expect that subsequent mirror synchronizations would remove files that no longer match the active filter criteria. However, bandersnatch currently only prevents new downloads of filtered content but leaves previously downloaded files in place.
This leads to:
- Wasted storage space - Old files that are no longer needed continue to consume disk space
- Inconsistent mirror state - The mirror contains files that would not be downloaded if starting fresh with the current configuration
- Manual cleanup burden - Users must manually identify and delete files, which is error-prone and time-consuming
Proposed Solution
Add functionality to automatically delete previously mirrored files that are now excluded by the active filtering configuration. This could be implemented as:
-
Automatic deletion during sync - An optional configuration flag (e.g.,
cleanup_filtered_files = true
) that enables automatic removal of files excluded by current filters during the mirror synchronization process -
Separate cleanup command - A dedicated command (e.g.,
bandersnatch cleanup-filtered
) that scans the mirror and removes files that don't match the current filter configuration
Use Cases
- Filtering old Python versions (e.g., Python 2.7 packages) to save storage after they're no longer needed
- Excluding specific platforms using
exclude_platform
filters - Adjusting package allowlists/blocklists and wanting the mirror to reflect those changes
- Applying regex or metadata filters and needing historical files to be cleaned up
Benefits
- Automatic storage optimization
- Mirror state consistent with current configuration
- Reduced manual maintenance overhead
- User expectations met when changing filter settings