Description
Dockerfile PRs:
- Support reproducible builds (except packages) httpd#250
- Support reproducible builds (except packages) tianon/docker-bash#38
- Support reproducible builds (except packages) haproxy#229
- Support reproducible builds (except packages) memcached#96
- Support reproducible builds (except packages) ruby#455
- Support reproducible builds (except packages) golang#532
- Support reproducible builds (except packages) wordpress#915
No PR is needed for the following repos:
- https://github.com/docker-library/hello-world
- https://github.com/docker-library/docker
- https://github.com/docker-library/busybox
- https://github.com/distribution/distribution-library-image
meta-script PR:
Enabling reproducible builds for Docker Official Images will improve traceability of the image origins and will help assessment of supply chain security of the images.
I also talked about this at DockerCon 2023: https://medium.com/nttlabs/dockercon-2023-reproducible-builds-with-buildkit-for-software-supply-chain-security-0e5aedd1aaa7
Scope of reproduction
The digests of the image manifest blobs (and config blobs and layer blobs) should be reproducible.
https://github.com/reproducible-containers/diffoci can be used for testing reproducibility.
SLSA provenance manifest blobs (enabled by default in recent buildx) are not reproducible by design, so the image index digest will not be reproducible.
Reproducing base images
Base images have to be pinned by the sha256 digest for reproduction.
A digest can be embedded in a FROM
instruction of a Dockerfile.
However, I wonder that image maintainers might not want to update Dockerfiles frequently to ensure the latest base image to be picked up during the upstream build.
In that case, we can just leave FROM
instructions unpinned, and let reproducers to use the COVNERT
action of source policies to dynamically replace the image identifier:
https://github.com/moby/buildkit/blob/v0.13.0-beta1/docs/build-repro.md
{
"action": "CONVERT",
"selector": {
"identifier": "docker-image://docker.io/library/alpine:latest"
},
"updates": {
"identifier": "docker-image://docker.io/library/alpine:latest@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
}
}
The digest to be used for the CONVERT
action is recored in the SLSA provenance.
We need to update buildx CLI to automatically generate CONVERT
actions from SLSA provenances.
(See the buildx CLI
section below)
Reproducing package versions
Debian and Ubuntu
Debian and Ubuntu keep old packages on http://snapshot.debian.org and http://snapshot.ubuntu.com.
I wrote a script to rewrite /etc/apt/sources.list
to use those snapshot servers:
https://github.com/reproducible-containers/repro-sources-list.sh/blob/master/repro-sources-list.sh
The snapshot timestamp can be supplied via $SOURCE_DATE_EPOCH
.
These snapshot servers are quite slow and sometimes flaky (especially for Debian), so, probably the snapshot servers
shouldn't be used for the upstream builds.
See moby/buildkit#4669 for how to rewrite /etc/apt/sources.list
in downstream builds.
Alpine
Reproducing apk packages is still challenging, as Alpine does not have snapshot servers.
The long-term plan is to capture apk packages on building and attach them to the image as artifacts:
Reproducing file timestamps
BuildKit v0.13 supports rewriting the timestamps of the files inside image layers to use $SOURCE_DATE_EPOCH
.
--output type=image,name=docker.io/username/image,push=true,rewrite-timestamp=true
https://github.com/moby/buildkit/blob/v0.13.0-beta1/docs/build-repro.md#source_date_epoch
Removal of logs, etc.
/var/log/alternatives.log
,/var/log/apt/history.log
, etc. will have to be removed due to non-deterministic timestamps/var/cache/ldconfig/aux-cache
has to be removed too
The entries in /var/cache/ldconfig/aux-cache are organised as an
associative array, with the keys including file attributes like
device number, inode number and inode change time. This means it is
not only unreproducible, but completely useless at boot time since
the device and inode numbers of libraries will be different.
Reproducing file contents
Some dockerfiles will need extra work for reproducing file contents.
e.g., sorting arrays, removing randomized mktemp, ...
- https://reproducible-builds.org/docs/stable-outputs/
- https://reproducible-builds.org/docs/randomness/
e.g., in case of gcc:
diff -ur --no-dereference a/usr/local/lib64/libgo.la b/usr/local/lib64/libgo.la
--- a/usr/local/lib64/libgo.la 2024-01-12 18:14:56.000000000 +0900
+++ b/usr/local/lib64/libgo.la 2024-01-12 18:21:45.000000000 +0900
@@ -17,7 +17,7 @@
inherited_linker_flags=' -pthread'
# Libraries that this one depends upon.
-dependency_libs=' -L/tmp/tmp.LWUIKDJ22E/x86_64-linux-gnu/libatomic/.libs -lpthread -lm'
+dependency_libs=' -L/tmp/tmp.yeTnsy0FEm/x86_64-linux-gnu/libatomic/.libs -lpthread -lm'
# Names of additional weak libraries provided by this library
weak_library_names=''
Buildx CLI
Buildx CLI should be updated to allow attesting reproducibility with a few commands.
Notably, buildx build
should have a flag like --repro from=gcc@sha256@...
to import build args and base image digests from an SLSA provenance:
$ # "none://" is a filler for the build context arg
$ docker buildx build \
--load \
-t gcc:local \
--repro from=gcc@sha256:f97e2719cd5138c932a814ca43f3ca7b33fde866e182e7d76d8391ec0b05091f \
none://
...
[amd64] Using SLSA provenance sha256:7ecde97c24ea34e1409caf6e91123690fa62d1465ad08f638ebbd75dd381f08f
[amd64] Importing Dockerfile blob embedded in the provenance
[amd64] Importing build context https://github.com/docker-library/gcc.git#af458ec8254ef7ca3344f12631e2356b20b4a7f1:13
[amd64] Importing build-arg SOURCE_DATE_EPOCH=1690467916
[amd64] Importing buildpack-deps:bookworm from docker-image://buildpack-deps:bookworm@sha256:bccdd9ebd8dbbb95d41bb5d9de3f654f8cd03b57d65d090ac330d106c87d7ed
...
$ diffoci diff gcc@sha256:f97e2719cd5138c932a814ca43f3ca7b33fde866e182e7d76d8391ec0b05091f gcc:local
...
CI
We will also need to have a CI to periodically attest reproducibility with the proposed CLI above.