Skip to content

Mirroring - Filtering for specific (ex. "py3.13") minor versions / slimming down size of mirroring? #1860

@InfiniteBSOD

Description

@InfiniteBSOD

Hello,

Using Python 3.13.1 on WSL2 (Ubuntu 24.04.1) .

My bandersnatch.conf has the following configuration:

[mirror]
; The directory where the mirror data will be stored.
directory = /mnt/d/bandersnatch

; Save JSON metadata into the web tree:
; URL/pypi/PKG_NAME/json (Symlink) -> URL/json/PKG_NAME
json = true

; Save package release files
release-files = true

; Cleanup legacy non PEP 503 normalized named simple directories
cleanup = false

; The PyPI server which will be mirrored.
; master = https://test.pypi.org
; scheme for PyPI server MUST be https
master = https://pypi.org

; The network socket timeout to use for all connections. This is set to a
; somewhat aggressively low value: rather fail quickly temporarily and re-run
; the client soon instead of having a process hang infinitely and have TCP not
; catching up for ages.
timeout = 50

; The global-timeout sets aiohttp total timeout for it's coroutines
; This is set incredibly high by default as aiohttp coroutines need to be
; equipped to handle mirroring large PyPI packages on slow connections.
global-timeout = 1800

; Number of worker threads to use for parallel downloads.
; Recommendations for worker thread setting:
; - leave the default of 3 to avoid overloading the pypi master
; - official servers located in data centers could run 10 workers
; - anything beyond 10 is probably unreasonable and avoided by bandersnatch
workers = 3

; Whether to hash package indexes
; Note that package index directory hashing is incompatible with pip, and so
; this should only be used in an environment where it is behind an application
; that can translate URIs to filesystem locations.  For example, with the
; following Apache RewriteRule:
;     RewriteRule ^([^/])([^/]*)/$ /mirror/pypi/web/simple/$1/$1$2/
;     RewriteRule ^([^/])([^/]*)/([^/]+)$/ /mirror/pypi/web/simple/$1/$1$2/$3
; OR
; following nginx rewrite rules:
;     rewrite ^/simple/([^/])([^/]*)/$ /simple/$1/$1$2/ last;
;     rewrite ^/simple/([^/])([^/]*)/([^/]+)$/ /simple/$1/$1$2/$3 last;
; Setting this to true would put the package 'abc' index in simple/a/abc.
; Recommended setting: the default of false for full pip/pypi compatibility.
hash-index = false

; Format for simple API to be stored in
; Since PEP691 we have HTML and JSON
simple-format = ALL

; Whether to stop a sync quickly after an error is found or whether to continue
; syncing but not marking the sync as successful. Value should be "true" or
; "false".
stop-on-error = false

; The storage backend that will be used to save data and metadata while
; mirroring packages. By default, use the filesystem backend. Other options
; currently include: 'swift'
storage-backend = filesystem

; Advanced logging configuration. Uncomment and set to the location of a
; python logging format logging config file.
; log-config = /etc/bandersnatch-log.conf

; Generate index pages with absolute urls rather than relative links. This is
; generally not necessary, but was added for the official internal PyPI mirror,
; which requires serving packages from https://files.pythonhosted.org
; root_uri = https://example.com

; Number of consumers which verify metadata
verifiers = 3

; Number of prior simple index.html to store. Used as a safeguard against
; upstream changes generating blank index.html files. Prior versions are
; stored under as "versions/index_<serial>_<timestamp>.html" and the current
; index.html will be a symlink to the latest version.
; If set to 0 no prior versions are stored and index.html is the latest version.
; If unset defaults to 0.
; keep_index_versions = 0

; Configure an option to compare whether a file is identical. By default the
; "hash" method is used which reads local file content and computes hashes,
; which is slow but more reliable; when "stat" method is used, file size and
; change time are used to compare, which is useful to reduce IO workload when
; verifying a lot of files frequently.
; Possible values are: hash (default), stat
compare-method = hash

; Configure to download packages from an alternative mirror.
; By default bandersnatch downloads packages from the server in the "url"
; value of json response from master server. This option asks bandersnatch
; to try to download from the configured PyPI mirror first, and fallback to
; "url" value if it was not successful (unable to get content or checksum
; mismatch). It is useful to sync most of the files from an existing, nearby
; mirror, for example when setting up a new server sitting next to an existing
; one for the purpose of load sharing.
; Downloading only from the mirror site without fallback is also possible,
; but be aware this could lead to more failures than expected and is not
; recommended for most scenarios.
; download-mirror = https://pypi-mirror.example.com/
; download-mirror-no-fallback = False

; vim: set ft=cfg:

; Configure a file to write out the list of files downloaded during the mirror.
; This is useful for situations when mirroring to offline systems where a process
; is required to only sync new files to the upstream mirror.
; The file be be named as set in the diff-file, and overwritten unless the
; diff-append-epoch setting is set to true.  If this is true, the epoch date will
; be appended to the filename (i.e. /path/to/diff-1568129735)
; diff-file = /srv/pypi/mirrored-files
; diff-append-epoch = true

[plugins]
enabled =
    exclude_platform
    latest_release

[blocklist]
platforms =
    macos
    freebsd
    py2.4
    py2.5
    py2.6
    py2.7
    py3.1
    py3.2
    py3.3
    py3.4
    py3.5
    py3.6
    py3.7
    py3.8
    py3.9

[latest_release]
keep = 3

I want to serve a local PyPi repository in an offline environment only catering to Python 3.13 on Windows and Linux.
AFAIK the "blocklist" and "platforms" filter only supports py2.4 ~ py2.7 and py3.1 ~ py3.10.
So I can't exclude "py3.11" and "py3.12"?
I omitted "py3.10" in the blocklist above since as I understand "py3.10" encompasses "py3.1x" meaning python 3.10 to 3.13?

With my config I downloaded 736GB until I received an error (sadly the clipboard messed up so I can't share it here).

Any ideas on being able to slim down my mirroring?

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededneeds_external_prWill rely on non maintainer PR in order to close

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions