Skip to content

Reproducible builds #16044

Open
Open
@AkihiroSuda

Description

@AkihiroSuda

Dockerfile PRs:

No PR is needed for the following repos:

meta-script PR:


Enabling reproducible builds for Docker Official Images will improve traceability of the image origins and will help assessment of supply chain security of the images.

I also talked about this at DockerCon 2023: https://medium.com/nttlabs/dockercon-2023-reproducible-builds-with-buildkit-for-software-supply-chain-security-0e5aedd1aaa7

Scope of reproduction

The digests of the image manifest blobs (and config blobs and layer blobs) should be reproducible.
https://github.com/reproducible-containers/diffoci can be used for testing reproducibility.

SLSA provenance manifest blobs (enabled by default in recent buildx) are not reproducible by design, so the image index digest will not be reproducible.

Reproducing base images

Base images have to be pinned by the sha256 digest for reproduction.

A digest can be embedded in a FROM instruction of a Dockerfile.
However, I wonder that image maintainers might not want to update Dockerfiles frequently to ensure the latest base image to be picked up during the upstream build.

In that case, we can just leave FROM instructions unpinned, and let reproducers to use the COVNERT action of source policies to dynamically replace the image identifier:

https://github.com/moby/buildkit/blob/v0.13.0-beta1/docs/build-repro.md

{
  "action": "CONVERT",
  "selector": {
    "identifier": "docker-image://docker.io/library/alpine:latest"
  },
  "updates": {
    "identifier": "docker-image://docker.io/library/alpine:latest@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
  }
}

The digest to be used for the CONVERT action is recored in the SLSA provenance.
We need to update buildx CLI to automatically generate CONVERT actions from SLSA provenances.
(See the buildx CLI section below)

Reproducing package versions

Debian and Ubuntu

Debian and Ubuntu keep old packages on http://snapshot.debian.org and http://snapshot.ubuntu.com.
I wrote a script to rewrite /etc/apt/sources.list to use those snapshot servers:
https://github.com/reproducible-containers/repro-sources-list.sh/blob/master/repro-sources-list.sh
The snapshot timestamp can be supplied via $SOURCE_DATE_EPOCH.

These snapshot servers are quite slow and sometimes flaky (especially for Debian), so, probably the snapshot servers
shouldn't be used for the upstream builds.

See moby/buildkit#4669 for how to rewrite /etc/apt/sources.list in downstream builds.

Alpine

Reproducing apk packages is still challenging, as Alpine does not have snapshot servers.

The long-term plan is to capture apk packages on building and attach them to the image as artifacts:

Reproducing file timestamps

BuildKit v0.13 supports rewriting the timestamps of the files inside image layers to use $SOURCE_DATE_EPOCH.

--output type=image,name=docker.io/username/image,push=true,rewrite-timestamp=true

https://github.com/moby/buildkit/blob/v0.13.0-beta1/docs/build-repro.md#source_date_epoch

Removal of logs, etc.

  • /var/log/alternatives.log, /var/log/apt/history.log, etc. will have to be removed due to non-deterministic timestamps
  • /var/cache/ldconfig/aux-cache has to be removed too

The entries in /var/cache/ldconfig/aux-cache are organised as an
associative array, with the keys including file attributes like
device number, inode number and inode change time. This means it is
not only unreproducible, but completely useless at boot time since
the device and inode numbers of libraries will be different.

https://linux.debian.kernel.narkive.com/7wfNAf7A/bug-845034-initramfs-tools-please-ensure-initrd-images-are-reproducible#post3

Reproducing file contents

Some dockerfiles will need extra work for reproducing file contents.
e.g., sorting arrays, removing randomized mktemp, ...

e.g., in case of gcc:

diff -ur --no-dereference a/usr/local/lib64/libgo.la b/usr/local/lib64/libgo.la
--- a/usr/local/lib64/libgo.la  2024-01-12 18:14:56.000000000 +0900
+++ b/usr/local/lib64/libgo.la  2024-01-12 18:21:45.000000000 +0900
@@ -17,7 +17,7 @@
 inherited_linker_flags=' -pthread'
 
 # Libraries that this one depends upon.
-dependency_libs=' -L/tmp/tmp.LWUIKDJ22E/x86_64-linux-gnu/libatomic/.libs -lpthread -lm'
+dependency_libs=' -L/tmp/tmp.yeTnsy0FEm/x86_64-linux-gnu/libatomic/.libs -lpthread -lm'
 
 # Names of additional weak libraries provided by this library
 weak_library_names=''

Buildx CLI

Buildx CLI should be updated to allow attesting reproducibility with a few commands.
Notably, buildx build should have a flag like --repro from=gcc@sha256@... to import build args and base image digests from an SLSA provenance:

$ # "none://" is a filler for the build context arg
$ docker buildx build \
  --load \
  -t gcc:local \
  --repro from=gcc@sha256:f97e2719cd5138c932a814ca43f3ca7b33fde866e182e7d76d8391ec0b05091f \
  none://
...
[amd64] Using SLSA provenance sha256:7ecde97c24ea34e1409caf6e91123690fa62d1465ad08f638ebbd75dd381f08f
[amd64] Importing Dockerfile blob embedded in the provenance
[amd64] Importing build context https://github.com/docker-library/gcc.git#af458ec8254ef7ca3344f12631e2356b20b4a7f1:13
[amd64] Importing build-arg SOURCE_DATE_EPOCH=1690467916
[amd64] Importing buildpack-deps:bookworm from docker-image://buildpack-deps:bookworm@sha256:bccdd9ebd8dbbb95d41bb5d9de3f654f8cd03b57d65d090ac330d106c87d7ed
...

$ diffoci diff gcc@sha256:f97e2719cd5138c932a814ca43f3ca7b33fde866e182e7d76d8391ec0b05091f gcc:local
...

CI

We will also need to have a CI to periodically attest reproducibility with the proposed CLI above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions