Skip to content

Conversation

glehmann
Copy link
Member

Based on #35

I had to move the user logic to the image entrypoint to make the images usable by any user even if he/she is not using the usual 1000:1000 ids.

The logic for the repository creation was moved to the Dockerfiles to make the images buildable with the standard github action docker/build-push-action.

Hopefully it will help with the issue with the repository selection in #35.

@glehmann glehmann requested review from psafont and ydirson July 24, 2025 15:14
Copy link

@psafont psafont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉🎉

(I've only reviewed the last 3 commits)

@glehmann glehmann force-pushed the gln/build-env-improvements-lzqx branch 6 times, most recently from 9e73f03 to 87724c2 Compare July 25, 2025 08:14
@glehmann
Copy link
Member Author

It works with docker but not with podman: we can't run with --userns=keep-id to be root at the container startup, but we need --userns=keep-id to keep the ownership of the files in the shared volume.

I'll propose something else.

@glehmann glehmann force-pushed the gln/build-env-improvements-lzqx branch 3 times, most recently from 78407fa to 7998f39 Compare July 25, 2025 13:57
@glehmann
Copy link
Member Author

It should be good now.

I've used podman's option --userns=keep-id:uid=1000,gid=1000, to map the builder user to but I'm not sure if that's an option that appeared recently or not.
@ydirson could you test it with your podman version?

@glehmann
Copy link
Member Author

glehmann commented Jul 25, 2025

I've also added 88d5bec to reduce the image sizes.

@glehmann glehmann force-pushed the gln/build-env-improvements-lzqx branch from 7998f39 to 88d5bec Compare July 25, 2025 14:22
@ydirson
Copy link

ydirson commented Jul 30, 2025

I've also added 88d5bec to reduce the image sizes.

There is an idiom to squash everything into a single layer, but I also see docker build --squash.

context: .
file: ./Dockerfile-8.x
push: true
tags: ghcr.io/${{ github.repository }}:8.2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We likely want those official floating tags to be set only when run on master, maybe we set particular tags for PRs?
Also, timestamped tags as is common may be interesting to have.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow is only configured on master.
Other tags may be useful, as well as building for PRs, but we must consider cleaning up the old images.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow is only configured on master.

Actually it seems to be configured for main instead :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there would be a reason for allowing it to run not just on master: detecting pipeline errors before they reach master

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that would be nice. Maybe push to the registry when on the master branch then.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there would be a reason for allowing it to run not just on master: detecting pipeline errors before they reach master

This comes at the cost of extra complexity: all steps now have a branch check.

I'm surprised that the push action doesn't support logging in, or use protected deployments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit more complex not to upload, but we spare the complexity to clean up temporary images.
It seems like a good compromise to me—your opinion may vary, of course

@glehmann glehmann force-pushed the gln/build-env-improvements-lzqx branch 2 times, most recently from e6311bc to 248ec8f Compare August 1, 2025 14:12
@glehmann glehmann marked this pull request as ready for review August 1, 2025 14:12
@glehmann glehmann requested a review from ydirson August 1, 2025 14:13
run.py Outdated
Comment on lines 109 to 110
docker_args += ["--userns=keep-id", "--security-opt", "label=disable"]
# With podman we use the `--userns` option to map the builder user to
# the user on the system.
docker_args += [f"--userns=keep-id:uid=1000,gid=1000", "--security-opt", "label=disable"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact the uid is preserved so entrypoint is entered as builder and not as root as in the Docker case is likely useful to document. But is it really a good idea to have such a big difference between the 2 runners? (especially as we want to move more root actions into the entrypoint in the future)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

run.py Outdated
Comment on lines 112 to 115
# With docker, we modify the builder user in the entrypoint to match the
# uid:gid of the user launching the container, and then continue with
# the builder user thanks to gosu.
docker_args += ["-e", f'BUILDER_UID={os.getuid()}', "-e", f'BUILDER_GID={os.getgid()}']
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a reader might ask "why gosu and and just su"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then he/she will quickly find on the internet why :-)

I usually try to only document what is specific to the project.
Do you think there is something specific about gosu in that context that should be commented?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding out about gosu did not teach me the answer. I had to go to the docker doc to find out why in a docker context they recommend using gosu. We need at least a pointer to that docker doc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a note in entrypoint.sh

@glehmann glehmann force-pushed the gln/build-env-improvements-lzqx branch 2 times, most recently from 57683dd to e21ff80 Compare August 1, 2025 17:33
Copy link

@ydirson ydirson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setup will store in separate places the container image and the script using it, but they are very much tied by an interface that was kept private until now. And recently we have started changing quite some of this interface.
Soon users will run a build and find out the hard way that they need to update their xcp-build-env script. Some safeguards would be useful here.

A simple solution would be some BUILDENV_INTERFACE_VERSION variable recorded in the image, which the init script would compare with a value passed at runtime. We would just need to be careful to bump this version on every interface change (envvars, bindmounts, ...)

@glehmann glehmann force-pushed the gln/build-env-improvements-lzqx branch from e21ff80 to da09de3 Compare September 5, 2025 09:32
@glehmann
Copy link
Member Author

glehmann commented Sep 5, 2025

This setup will store in separate places the container image and the script using it, but they are very much tied by an interface that was kept private until now. And recently we have started changing quite some of this interface. Soon users will run a build and find out the hard way that they need to update their xcp-build-env script. Some safeguards would be useful here.

A simple solution would be some BUILDENV_INTERFACE_VERSION variable recorded in the image, which the init script would compare with a value passed at runtime. We would just need to be careful to bump this version on every interface change (envvars, bindmounts, ...)

That's a good point.
I'd rather use a docker tag for that, in order to make the image automatically downloaded on the dev computer when he/she updates it's xcp-ng-dev installation.

@glehmann glehmann force-pushed the gln/build-env-improvements-lzqx branch from da09de3 to b7eb0a5 Compare September 5, 2025 11:26
@glehmann glehmann force-pushed the gln/build-env-improvements-lzqx branch 4 times, most recently from 8cb51c2 to 644d3ed Compare September 5, 2025 12:19
@glehmann glehmann requested a review from ydirson September 5, 2025 12:21
@glehmann glehmann force-pushed the gln/build-env-improvements-lzqx branch 2 times, most recently from 223c7b1 to 3d7b5ae Compare September 6, 2025 12:14
Comment on lines 28 to 29
push: ${{ github.ref == 'refs/heads/master' }}
tags: ghcr.io/${{ github.repository }}:8.2-${{ env.VERSION }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to push it to a different tag (git-describe-based) so the test job can be reworked to test the generated container, instead of rebuilding its own?
I'm thinking about some equivalent of that gitlab pipeline

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that should be possible. Could we keep it for another PR?

@glehmann glehmann force-pushed the gln/build-env-improvements-lzqx branch 2 times, most recently from b26e293 to 30f8a45 Compare September 8, 2025 16:41
@@ -0,0 +1,83 @@
name: Build and Push Docker Image to GHCR

on: push
Copy link

@psafont psafont Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the workflow do anything if it's not on master branch? I would rather limit the branches where it's run here rather than doing it per-step, now there are 4 places where this branch limitation needs to be placed (6 when the 9.0 is enabled, which is ripe for errors)

Copy link
Member Author

@glehmann glehmann Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It builds the image on all the branches, but only uploads when on master

file: ./src/xcp_ng_dev/files/Dockerfile-8.x
push: ${{ github.ref == 'refs/heads/master' }}
tags: ghcr.io/${{ github.repository }}:8.2
tags: ghcr.io/${{ github.repository }}:8.2-${{ env.VERSION }}
Copy link

@psafont psafont Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we can change the interface used to communicate with the container.

In all the years I've used containers I've never come across the need for a such a thing.

Are you sure this is a good idea? It forces all users to become aware of this concept (which I think it's very rare) and complicates using the tool. Now, which version does my installed tool support? is the container outdated, is my tool outdated? what if need to build from an old container, how do I get a compatible tool version?

This means at the very least that, in order to be introduced, users need to be educated about it: it needs to be documented in the readme and the help output.

I think this has quite a few downsides, and we should think long about what we're trying to solve with this solution. Are you sure this can't be replaced by something that users are already used to?
I would try to have frequent tagged releases for the tools, and use date versioning in the tags, on top of having X.X tags for the latest of each version of the containers.

It might be probably worth making the releases of the containers happen every Thursday, or even every day, to decouple them from pushing changes to the tool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the image as being targeted to the end user as is. It's made to be used with xcp-ng-dev.
Do you have some intended usage for the image outside of xcp-ng-dev? If yes, I agree that's not the best tag naming.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the image as being targeted to the end user as is.

Imagine the case where the version is changed, and the user hasn't updated the cli tool. This will show an error to the user, which will it be, how can the user know how to solve the issue?

Another edge case, whihc should be rarer, but still useful, IMO: the user want to make a build on top of a previous container image because they are developing a feature, a newer version of the packages got released, the code between both is incompatible. Now they can rebase their work, but they might not have time to do that because of some deadline for a prototype. How do they know what container tag to use? With dates in tags becomes easier, which this version scheme only hashes are available, and then they have to fish the version of the tool that's compatible with that container

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the image as being targeted to the end user as is.

Imagine the case where the version is changed, and the user hasn't updated the cli tool. This will show an error to the user, which will it be, how can the user know how to solve the issue?

That's the point of adding this version in the image tag, so the user can continue using the cli tool at the same version, without being disturbed by a non-backward compatible change.
When we'll switch to protocol-version 2, the image 8.3-1 would still be there, along the newer, and incompatible, 8.3-2.

Another edge case, whihc should be rarer, but still useful, IMO: the user want to make a build on top of a previous container image because they are developing a feature, a newer version of the packages got released, the code between both is incompatible. Now they can rebase their work, but they might not have time to do that because of some deadline for a prototype. How do they know what container tag to use? With dates in tags becomes easier, which this version scheme only hashes are available, and then they have to fish the version of the tool that's compatible with that container

They can just stay on their version of the cli tool, and they would download the image version expected by the tool, and thus compatible with the tool.

Just to make sure we are talking of the same thing: the cli tool is package with protocol-version.txt containing the version number it expects. The image run by the cli tool must have the protocol-version of the cli tool.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and that's all transparent to the user, who just has to specify the XCP-ng version

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the point of adding this version in the image tag, so the user can continue using the cli tool at the same version, without being disturbed by a non-backward compatible change.

But then the container will stop being upgraded, and that might cause issues: like a user developing a package with the old container successfully, while it ends up incompatible with an up-to-date container. This means users need to be aware of this, or might fall into this trap.

They can just stay on their version of the cli tool, and they would download the image version expected by the tool, and thus compatible with the tool.

Will we keep generating images for both the old and new versions to ensure the first version is not outdated?

Just to make sure we are talking of the same thing: the cli tool is package with protocol-version.txt containing the version number it expects. The image run by the cli tool must have the protocol-version of the cli tool.

Yes, this is exaclty what we're talking about. I've never seen this pattern in any Dockerfile I've come across in the +10 years I've used containers, and I see it makes issues that are not obivous to detect, which I find alarming.

and that's all transparent to the user, who just has to specify the XCP-ng version

On the surface level it looks like it's transparent, but looking into it in more detail, it's really not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move that commit to another PR so we can move forward

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then the container will stop being upgraded, and that might cause issues: like a user developing a package with the old container successfully, while it ends up incompatible with an up-to-date container. This means users need to be aware of this, or might fall into this trap.

I'll move that commit to another PR so we can move forward

Well, OTOH when a newer version of the container gets published, and a user has an old one, the new one is not automatically pulled either. OTOH a user upgrading the build script to an incompatible version will get an error instead of the new image being automatically downloaded.

Will we keep generating images for both the old and new versions to ensure the first version is not outdated?

No, we rather talked about issuing a warning when the build script is outdated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can continue the discussion in #51

@glehmann glehmann force-pushed the gln/build-env-improvements-lzqx branch 3 times, most recently from f07d40a to c4112b4 Compare September 10, 2025 12:45
Comment on lines 8 to 23
if [ "${BUILDER_UID}" ]; then
# BUILDER_UID is defined, update the builder ids, and continue with the builder user
if [ "${BUILDER_GID}" != "1000" ]; then
groupmod -g "${BUILDER_GID}" builder
fi
if [ "${BUILDER_UID}" != "1000" ]; then
usermod -u "${BUILDER_UID}" -g "${BUILDER_GID}" builder
fi
find ~builder -maxdepth 1 -type f | xargs chown builder:builder
exec /usr/local/bin/gosu builder "$@"
else
# no BUILDER_ID, just continue as the current user
exec "$@"
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if [ "${BUILDER_UID}" ]; then
# BUILDER_UID is defined, update the builder ids, and continue with the builder user
if [ "${BUILDER_GID}" != "1000" ]; then
groupmod -g "${BUILDER_GID}" builder
fi
if [ "${BUILDER_UID}" != "1000" ]; then
usermod -u "${BUILDER_UID}" -g "${BUILDER_GID}" builder
fi
find ~builder -maxdepth 1 -type f | xargs chown builder:builder
exec /usr/local/bin/gosu builder "$@"
else
# no BUILDER_ID, just continue as the current user
exec "$@"
fi
if [ "${BUILDER_UID}" ]; then
# BUILDER_UID is defined (i.e. this is Docker, and we execute this as root),
# update the builder ids, and continue with the builder user
if [ "${BUILDER_GID}" != "1000" ]; then
groupmod -g "${BUILDER_GID}" builder
fi
if [ "${BUILDER_UID}" != "1000" ]; then
usermod -u "${BUILDER_UID}" -g "${BUILDER_GID}" builder
fi
find ~builder -maxdepth 1 -type f | xargs chown builder:builder
exec /usr/local/bin/gosu builder "$@"
else
# no BUILDER_ID (i.e. this is Podman), just continue as the current user
exec "$@"
fi

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But actually, can't we just avoid this with userns-remapping in Docker?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, it's a daemon lever configuration, and thus might not be easy to do for our developers, may have impact on their other containers, and may not be possible in the CI

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damned, you're right. I wonder if we could make something similar to podman work with rootless docker, at least that should be user-only configuration (and we may want to encourage/enforce this to push for better security)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others may disagree, but if I want a rootless docker, I use podman 🙂

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you'll soon agree to drop support for docker 😉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would, if the others would use podman.
I want to keep docker only because I'm quite sure it will be the most used container engine for a few more years…

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reported a couple of bugs for rootless podman when using it for the internal xenserver tooling, back in 2019.

I thought the whole linux world had moved away from docker already because of its weird packaging choices, not to mention its rootful architecture

Signed-off-by: Gaëtan Lehmann <[email protected]>
This way the image is usable by everybody, independantly of its ids.

With docker, we modify the builder user in the entrypoint to match the
uid:gid of the user launching the container, and then continue with
the builder user thanks to gosu.

With podman we use the `--userns` option to map the builder user to
the user on the system. I haven't found a way with podman to use the
same mechanism as in docker, and vis versa.

Signed-off-by: Gaëtan Lehmann <[email protected]>
The images are always built, but only pushed to the registry when
running on the master branch.

Signed-off-by: Gaëtan Lehmann <[email protected]>
for a significantly smaller image size.

Before:

ghcr.io/xcp-ng/xcp-ng-build-env             8.2            e985e704c252  10 minutes ago  1.03 GB
ghcr.io/xcp-ng/xcp-ng-build-env             8.3            b557dcb6d541  11 minutes ago  1.14 GB
ghcr.io/xcp-ng/xcp-ng-build-env             9.0            c738df7d5577  11 minutes ago  857 MB

After:

ghcr.io/xcp-ng/xcp-ng-build-env             8.2            a9a91fa03db8  7 seconds ago       491 MB
ghcr.io/xcp-ng/xcp-ng-build-env             8.3            a36b1399ac01  About a minute ago  557 MB
ghcr.io/xcp-ng/xcp-ng-build-env             9.0            78011dcbd6f3  11 minutes ago      708 MB

Signed-off-by: Gaëtan Lehmann <[email protected]>
@glehmann glehmann force-pushed the gln/build-env-improvements-lzqx branch from c4112b4 to 1c18d26 Compare September 10, 2025 15:33
@glehmann glehmann requested a review from ydirson September 10, 2025 15:35
Copy link

@ydirson ydirson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will do. I'm still not happy having a complicated behavior just to get compatibility with Docker: IMHO the container engine should indeed be seen as an implementation, and we should rather tell devs to install Podman. But the code is well delimited, and we can remove it when we decide to.

@ydirson ydirson requested a review from stormi September 11, 2025 07:55
@glehmann glehmann merged commit e9cafe6 into master Sep 11, 2025
6 checks passed
@glehmann glehmann deleted the gln/build-env-improvements-lzqx branch September 11, 2025 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants