Skip to content
This repository was archived by the owner on Sep 8, 2025. It is now read-only.

feat(build): enhance container build times#438

Merged
k8s-ci-robot merged 1 commit intokubernetes-retired:mainfrom
hargut:feat/develop-container-images
Sep 2, 2025
Merged

feat(build): enhance container build times#438
k8s-ci-robot merged 1 commit intokubernetes-retired:mainfrom
hargut:feat/develop-container-images

Conversation

@hargut
Copy link
Copy Markdown
Contributor

@hargut hargut commented Aug 18, 2025

This draft pr is intended to further discussing #424. It only comprises the build for controlplane and provides a draft layout that could be enrolled to other containers.

  • based on Debian trixie slim
  • using rustup via apt to install stable rust
  • separated build and runtime container
    • runtime container based on plain debian-slim (including apt, shell, ...)
  • requires a workspace volume mount
    • optionally supports a workspace/target volume mount to enable re-using of compiled elements
    • in case not mounted a full fledged build is executed
  • all source code artefacts required for the build are copied into the container
    • this is currently the most time intense operation during re-usage of pre-compiled elements
  • CARGO_HOME is redefined to save downloaded dependencies into target/
$ time docker build -f build/Containerfile.controlplane -v "$(pwd):/workspace" -v "$(pwd)/target:/build/target/" --build-arg BUILD_TIMESTAMP=$(date +%s%3N)
[1/2] STEP 1/7: FROM debian:trixie-slim AS builder
[1/2] STEP 2/7: RUN apt update     && apt install -y build-essential rustup     && rustup install stable
--> Using cache 5dc77c2dd41693beebc66a306cf73312f5c9355dfcca1bbfdfa45a960c03b7e1
--> 5dc77c2dd416
[1/2] STEP 3/7: ARG BUILD_TIMESTAMP
--> Using cache 52db6776fc06faad520ef09bb1bdeb59125489af46afc0c35c6c9c7977e61eee
--> 52db6776fc06
[1/2] STEP 4/7: WORKDIR /build
--> Using cache 2862ae0bb1b14024eebbda34b432496c94bbe41a359932b351f0a4bd0470d514
--> 2862ae0bb1b1
[1/2] STEP 5/7: ENV CARGO_HOME=/build/target
--> Using cache 61f2d25c7adb3077b1b50f68ea3cf38a031cb31f17672f3257143f5dbc7f28b6
--> 61f2d25c7adb
[1/2] STEP 6/7: RUN echo "${BUILD_TIMESTAMP}"     && date +%s     && cp -a /workspace/Cargo.toml     /workspace/Cargo.lock     /workspace/controlplane/     /workspace/dataplane/     /workspace/tests-integration/     /workspace/tools/     /workspace/xtask/     /build
1755529247016
1755529248
--> ae88babc1118
[1/2] STEP 7/7: RUN date +%s     && cargo build -p controlplane     && date +%s     && mkdir -p results/     && cp target/debug/controller results/
1755529254
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.21s
1755529254
--> 7452142d2300
[2/2] STEP 1/6: FROM debian:trixie-slim
[2/2] STEP 2/6: LABEL org.opencontainers.image.source=https://github.com/kubernetes-sigs/blixt
--> Using cache fd399516894249f40dc62e585e7f2816bf54cdac0987414cec291507bf59abf7
--> fd3995168942
[2/2] STEP 3/6: WORKDIR /
--> Using cache d63dbdec7772ab7fcd3423962f7bfe8b221a6b81d03236efd9307c3424607143
--> d63dbdec7772
[2/2] STEP 4/6: COPY --from=builder /build/results/controller /controller
--> Using cache 41a90ea11d0b008100a60b8c6aff04c335414efb09322e3f51bc4122bd8f45fd
--> 41a90ea11d0b
[2/2] STEP 5/6: USER 1000:1000
--> Using cache e7d41a49b6da537fa698e3ddc31a8ebaa41c2a6f12c7719d99219e9c80e128b0
--> e7d41a49b6da
[2/2] STEP 6/6: ENTRYPOINT [ "/controller" ]
--> Using cache a21a4a52596ea6a6adfa0c1e8731a73f83a882d46d1d4ffbe5d38e7dbf6342d7
--> a21a4a52596e
a21a4a52596ea6a6adfa0c1e8731a73f83a882d46d1d4ffbe5d38e7dbf6342d7

real    0m14.483s
user    0m1.076s
sys     0m0.129s
$ 

fixes: #424

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 18, 2025
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 18, 2025
@hargut hargut marked this pull request as ready for review August 19, 2025 14:33
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 19, 2025
@hargut hargut changed the title feat(build): enhance container build times (WIP) feat(build): enhance container build times Aug 19, 2025
@hargut
Copy link
Copy Markdown
Contributor Author

hargut commented Aug 19, 2025

Further details can be found within the commit message.

This was a rabbit hole and in terms of understandability, simplicity and speed I would prefer a cargo build followed by a container build that just copies the resulting artefacts and checks if the linking matches with what is provided by the container.

# ------------------------------------------------------------------------------
# Image
# ------------------------------------------------------------------------------

FROM debian:trixie-slim

LABEL org.opencontainers.image.source=https://github.com/kubernetes-sigs/blixt

LABEL org.opencontainers.image.licenses=GPL-2.0-only,BSD-2-Clause

WORKDIR /

RUN cp /workspace/target/debug/loader /dataplane \
    && cp /workspace/dataplane/LICENSE.GPL-2.0 / \
    && cp /workspace/dataplane/LICENSE.BSD-2-Clause / \
    && /usr/bin/ldd /dataplane

ENTRYPOINT ["/dataplane"]

@shaneutt shaneutt self-assigned this Aug 19, 2025
@shaneutt shaneutt moved this to Review in Blixt Project Board Aug 19, 2025
@shaneutt shaneutt added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Aug 29, 2025
Copy link
Copy Markdown
Contributor

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this approach we mount the host system's target/ directory into the container and build inside of the container, emitting the cache to the host.

I get that this can improve build times in general and help with iteration, so I'm still open to it. However I also see it as adding a lot of ceremony to replicate what we have in the Makefile already to build locally, inside the container. 🤔

Question for you: couldn't we get a lot of the same value with much less code if we just chain a make build.all onto the front of a make build.image, and then mount the volume?

I suppose there are potential downsides, including control of the build toolchain inside of the container, but at least that one might be easy enough to deal with by introducing a rust-toolchain.toml. In any case though, I'm curious as to your thoughts 🤔

@hargut
Copy link
Copy Markdown
Contributor Author

hargut commented Aug 29, 2025

Question for you: couldn't we get a lot of the same value with much less code if we just chain a make build.all onto the front of a make build.image, and then have the containerfiles do COPY target/ target?

#:~/Sources/github.com/blixt$ du -hs target/debug/
7.3G    target/debug/
#:~/Sources/github.com/blixt$ 

COPY target/ target will then cause IO for the at least 7.3Gb on every run.

I suppose there are potential downsides, including control of the build toolchain inside of the container, but at least that one might be easy enough to deal with by introducing a rust-toolchain.toml. In any case though, I'm curious as to your thoughts 🤔

I think it would be helpful to understand if it is crucial to have everything in one Containerfile, or if there could be a dedicated one for develop use. In terms of simplicity, understandability and speed there will not be anything faster than copying the already built binary into an empty container image. No fancy setup required, potentially just a make flag. No worries about file permissions, id mappings, features of various container runtimes.

Within CI the fully blown in-container build for release builds could be built for integration tests.

@hargut
Copy link
Copy Markdown
Contributor Author

hargut commented Aug 29, 2025

My idea was to keep the build environment isolated as it was in the existing build. Only copy over the source files and have a clearly defined build.

Sorry, I said COPY but that was me fumbling. What I mean was what about: "build, THEN mount" as opposed to "mount, THEN build"

I do think that there is not a lot of difference in terms of timing between "build, THEN mount" or "mount, THEN build". As the containers are built sequentially, the second container will reuse the build results from the first. To check the results on that I'll push a commit that rebuilds dataplane once with cache from controlplane and once without.

With the container build the layers will as well be reused (as they are identical in the first lines), the first layer being built for the dataplane container will be the rustup install nightly \ ....
https://github.com/kubernetes-sigs/blixt/actions/runs/17326733459/job/49192614132#step:5:1

Where I do do see the most potential in terms of overall used resources is to combine the rust-build and image-build-integration-tests. Even tough the total time might increase by chaining those on the same runner instance.

@hargut
Copy link
Copy Markdown
Contributor Author

hargut commented Aug 29, 2025

I think it would be important to clarify what is important and to optimize for that. Currently it is not clear for me what the goals are to aim for.

Isolated build within a defined container environment?
Least lines of code in the build process (maintainability)?
Uniform build for all usages?
Simplicity of the build?
Container image size?
CI overall resource usage?

To enhance developer experience for me a goal is to reduce build and deployment time on a local environment.

@hargut
Copy link
Copy Markdown
Contributor Author

hargut commented Aug 29, 2025

So in this approach we mount the host system's target/ directory into the container and build inside of the container, emitting the cache to the host.

In the CI it is currently setup like this, but it would as well work to build first locally and then reuse the results in the container build. The other direction is right now not fully covered with the Makefile changes. The rust-toolchain.toml might help to simplify.

Question for you: couldn't we get a lot of the same value with much less code if we just chain a make build.all onto the front of a make build.image, and then mount the volume?

Wouldn't that be nearly equivalent to to just copying the final binaries into the container image?
Or would we still want to run the cargo build inside the container after mounting the target/ inside the container?

Please let's try to avoid building a time-grave within this topic. It was intended to speed-up things and not to burry hours into the change. Build times are already approx. down to half on a fresh build with the current proposal.

building and reusing rustc artifacts from container and host
optimized for build speed during development

unfortunately this is a rabbit hole:
- docker buildx does not support volumes during build
  - using a build context is an IO heavy and slow operation when target/ is populated
- used podman to mount & build the images
  - podman (4.9.x) on github actions within ubuntu-24.04 has issues on the uid/gid maps
    with --userns=host
  - uid/gid maps work as expected on podman 5.5.2
- kind support for loading images from podman is broken
  - kubernetes-sigs/kind#3945
- to reuse cargo compiled artifacts between host and container
  - the paths for the sources need to match
  - the paths for the toolchains need to match

Signed-off-by: Harald Gutmann <harald@gutmann.one>
Copy link
Copy Markdown
Contributor

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 2, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hargut, shaneutt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 2, 2025
@k8s-ci-robot k8s-ci-robot merged commit a068fd0 into kubernetes-retired:main Sep 2, 2025
4 checks passed
@github-project-automation github-project-automation bot moved this from Review to Done in Blixt Project Board Sep 2, 2025
@hargut hargut deleted the feat/develop-container-images branch September 2, 2025 18:30
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

image builds for fast local deployments and to improve developer experience

3 participants