Skip to content

Conversation

@jgehrcke
Copy link
Collaborator

@jgehrcke jgehrcke commented Sep 4, 2025

This is based on #433.

Before discussing code specifics (such as maybe putting code into pkg/, whether or not to use otiai10/copy, ...) we need to talk about the high-level strategy.

The strategy currently implemented by the patch works (tested), but contains a bit of a surprising detour.

I've captured the details in code comments, but want to summarize here, too:

  • rename() fails when the target is a mount target: /etc/hosts: device or resource busy
  • It seems to be normal for /etc/hosts to be mounted into the container (managed by the container runtime)
  • I think we actually don't want any further modifications to /etc/hosts performed by the container runtime after CD daemon startup: one way to ensure that is to remove the mount.
  • Hence, we have two arguments for unmounting /etc/hosts in the container early after startup: i) enable /etc/hosts as a rename() target and ii) guarantee that we own and manage the file.

So, this patch upon CD daemon startup (in order)

  1. takes note of the /etc/hosts contents
  2. performs the unmount
  3. restores /etc/hosts exactly as before the unmount (same content and permissions), just as a regular file

The downside is that unmount requires root privileges, and hence for now I added privileged: true to the pod spec. I don't feel great about this.

Our concern of performing atomic updates to this file in view of a mount and cross-ownership between host and container is certainly shared by many others. Some of the best discussion is in here: moby/moby#46908 (comment)

Two high-level thoughts:

  • I actually think we do ourselves a favor by removing FD lease-controlled /etc/hosts updates initiated from the host from our big picture. This speaks in favor of unmount.
  • Maybe we find a way to perform the unmount during container startup, but then can fall back to executing non-privileged.

Edit: an alternative to /etc/hosts at a different path would be great.

The C lib functions gethostbyname and getaddrinfo support HOSTALIASES as documented in https://man7.org/linux/man-pages/man7/hostname.7.html. Most notably, this files does not allow for mapping DNS names to IP addresses:

There's one more caveat: the HOSTALIASES file maps alias names to canonical host names, but the canonical name must be resolvable. You can't specify an IP address as the target.

(https://blog.tremily.us/posts/HOSTALIASES/)

klueska and others added 8 commits September 3, 2025 22:00
Without this the IMEX daemons were getting confused if the same DNS name
was used to by different nodes to point to differernt IMEX daemons in
the ensemble.

Signed-off-by: Kevin Klues <[email protected]>
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 4, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
@jgehrcke
Copy link
Collaborator Author

jgehrcke commented Sep 6, 2025

Strongly related ref docs:

The kubelet manages the hosts file for each container of the Pod to prevent the container runtime from modifying the file after the containers have already been started. Historically, Kubernetes always used Docker Engine as its container runtime, and Docker Engine would then modify the /etc/hosts file after each container had started.

Current Kubernetes can use a variety of container runtimes; even so, the kubelet manages the hosts file within each container so that the outcome is as intended regardless of which container runtime you use.

(https://kubernetes.io/docs/tasks/network/customize-hosts-file-for-pods/#why-does-kubelet-manage-the-hosts-file)

@klueska
Copy link
Collaborator

klueska commented Sep 7, 2025

Interesting. I suppose we could create a minimal chroot environment inside our container with our own /etc/hosts file, the binaries we want to execute and any dependent libs. Seems a bit overkill but it would probably work.

@klueska klueska added this to the v25.8.0 milestone Sep 8, 2025
@jgehrcke
Copy link
Collaborator Author

minimal chroot environment inside our container with our own /etc/hosts file

Interesting idea. So, that would give us full control over /etc/hosts without having to (un)mount anything. But how can we chroot without root privileges?

@klueska
Copy link
Collaborator

klueska commented Sep 10, 2025

you can chroot without privileges so long as the folder you are chrooting to doesnt require root. The problem is that you can't mount in the necessary /proc /sys dirs without being root.

@klueska klueska added the robustness issue/pr: edge cases & fault tolerance label Sep 11, 2025
@klueska klueska modified the milestones: v25.8.0, v25.12.0, v25.8.1 Sep 18, 2025
@klueska klueska modified the milestones: v25.8.1, v25.12.0 Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

robustness issue/pr: edge cases & fault tolerance

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants