Skip to content

Limited number of packages synced with --force-check despite filters not targeting core packages #2059

@naveen1583

Description

@naveen1583

Hi,

I have been running bandersnatch mirror on my mirror instance, but I consistently get only a limited number of entries in the todo file (about 102), and the total number of packages mirrored is way below the expected count (should be thousands, but is much lower).

My setup:

  • Bandersnatch version: 6.6.0
  • Python version: 3.13.3
  • OS: c6gn.4xlarge
  • Storage backend: S3

Config (/etc/bandersnatch.conf):

Show config
[mirror]
master = https://pypi.org
workers = 10
verifiers = 14
timeout = 60
global-timeout = 43200
stop-on-error = false
release-files = true
directory = /s3-bucket/mirror/
storage-backend = s3
diff-file = /s3-bucket/mirror/diff

[plugins]
enabled =
    blocklist_project
    regex_project
    regex_release
    exclude_platform
    allowlist_project

[blocklist]
packages =
    demo-package
    badpackagedk318
platforms =
    win32
    win_amd64
    macosx
    py2
    py2.4
    py2.5
    py2.6
    py2.7

[filter_regex]
releases =
    rc
    a
    b
    dev
    [.-_]rc[0-9]+
    [.-_]a[0-9]+
    [.-_]b[0-9]+
    [.-_]dev[0-9]+
packages =
    ^test.*$
    ^example.*$
    ^demo.*$
    ^hello-world$
    ^junk.*$
    ^tmp.*$
    ^.*-data$
    ^.*-sample$
    ^.*-test$
    ^.*-demo$
    ^.*-junk$
    ^.*-dummy$
    ^.*-backup$
    ^.*-deprecated$
    ^.*-malware$
    ^.*-virus$
    ^.*-spam$
    ^.*-exploit$
    ^.*-bypass$
    ^.*-attack$

Problem details:

  • Core packages are missing: Packages like ansible-core, frida, tensorflow, and other major PyPI distributions are not available in the mirror or the todo file.
  • Force-check does not increase todo file size: Running bandersnatch mirror --force-check only updates the todo file with ~100 entries.
  • Expectation: I should expect thousands of packages, given the filters are not targeting these core packages.

Troubleshooting steps taken:

  • Verified config to ensure no filter should block these packages
  • Checked bandersnatch logs for unexpected exclusions
  • Attempted to disable all plugins: when I do so, I see more packages mirrored as expected
  • Confirmed S3/storage is working and not throwing errors

Questions:

  1. Are there any known issues with plugin combinations or force-check logic that could cause aggressive exclusion of non-filtered packages?
  2. Is there any way to debug why these packages specifically are ignored or skipped?
  3. Are there common misconfigurations or bugs I should check for in plugin ordering or config syntax?

Any help appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or requesthelp wantedExtra attention is neededneeds_external_prWill rely on non maintainer PR in order to close

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions