Skip to content

Feature Request: Automatic deletion of previously mirrored files excluded by filtering configuration #2052

@naveen1583

Description

@naveen1583

Summary

Currently, when filtering configuration is changed to exclude certain files (e.g., using exclude_platform to filter out old Python versions), bandersnatch does not automatically delete previously mirrored files that are now excluded by the active filters. This results in wasted storage space and creates a mismatch between user expectations and actual mirror contents.

Problem

When users update their filtering configuration to be more restrictive (for example, excluding packages for Python 2.7 or old platform-specific wheels), they expect that subsequent mirror synchronizations would remove files that no longer match the active filter criteria. However, bandersnatch currently only prevents new downloads of filtered content but leaves previously downloaded files in place.

This leads to:

  1. Wasted storage space - Old files that are no longer needed continue to consume disk space
  2. Inconsistent mirror state - The mirror contains files that would not be downloaded if starting fresh with the current configuration
  3. Manual cleanup burden - Users must manually identify and delete files, which is error-prone and time-consuming

Proposed Solution

Add functionality to automatically delete previously mirrored files that are now excluded by the active filtering configuration. This could be implemented as:

  1. Automatic deletion during sync - An optional configuration flag (e.g., cleanup_filtered_files = true) that enables automatic removal of files excluded by current filters during the mirror synchronization process

  2. Separate cleanup command - A dedicated command (e.g., bandersnatch cleanup-filtered) that scans the mirror and removes files that don't match the current filter configuration

Use Cases

  • Filtering old Python versions (e.g., Python 2.7 packages) to save storage after they're no longer needed
  • Excluding specific platforms using exclude_platform filters
  • Adjusting package allowlists/blocklists and wanting the mirror to reflect those changes
  • Applying regex or metadata filters and needing historical files to be cleaned up

Benefits

  • Automatic storage optimization
  • Mirror state consistent with current configuration
  • Reduced manual maintenance overhead
  • User expectations met when changing filter settings

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is neededneeds_external_prWill rely on non maintainer PR in order to close

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions