Description
Description
It would be nice for cibuildwheel to include by default a post-processing step
normalizing wheels for determinism/reproducibility.
This could be a significant step toward widespread verifiably-reproducible
builds of PyPI-hosted wheels.
Background
When using cibuildwheel to build a straightforward Cython package, I found that
by default, the resulting wheels were never bit-for-bit reproducible, because of
ZIP metadata (timestamps etc).
That is, the wheels always came out with a different checksum after each run. An
inspection of the wheels showed that the files contained in the archive were in
fact bit-for-bit reproducible, and the differences were purely due to ZIP
metadata. In particular, the problem was with timestamps (and potentially also
the ordering of entries).
When I added a post-processing step using either python-stripzip
or Debian's
strip-nondeterminism
, wheels were bit-for-bit reproducible.
The cibuildwheel docs mention: "Because the builds are happening in manylinux
Docker containers, they're perfectly reproducible." This is generally true for
the build itself, but is not true for the final artifacts, because of the ZIP
timestamps.
Considerations
There are a number of tools for this.
To my knowledge, Debian's strip-nondeterminism
is the most mature and featureful one.
Also of particular interest is python-stripzip
.
Other tools include:
There are some issues which arise, notably:
- Ordering of ZIP entries.
- What timestamp(s) should be used in the ZIP metadata.
- Whether to respect
SOURCE_DATE_EPOCH
for this usage.
Note that if desiring to implement reproducible builds for a specific project,
one can just pick a strategy, stick with it, and be done with it. But if aiming
to implement a blanket solution in a centralized tool, it's probably worth
investigating the details.
I might tentatively suggest python-stripzip
for use in cibuildwheel, because
it does less modifications than strip-nondeterminism
. In particular, it
doesn't change order of entries, so if .dist-info
files are placed at end of
the ZIP (as is best practice), it will leave them so.
Here is a script using cibuildwheel and python-stripzip which demonstrates
successfully generating bit-for-bit reproducible wheels:
https://gist.github.com/tabbyrobin/d6c5cf5323fe54a50004c1291da39315#file-build-wheels-sh
Build log
No response
CI config
No response