wip: atomic updates to /etc/hosts #537

jgehrcke · 2025-09-04T12:43:53Z

This is based on #433.

Before discussing code specifics (such as maybe putting code into pkg/, whether or not to use otiai10/copy, ...) we need to talk about the high-level strategy.

The strategy currently implemented by the patch works (tested), but contains a bit of a surprising detour.

I've captured the details in code comments, but want to summarize here, too:

rename() fails when the target is a mount target: /etc/hosts: device or resource busy
It seems to be normal for /etc/hosts to be mounted into the container (managed by the container runtime)
I think we actually don't want any further modifications to /etc/hosts performed by the container runtime after CD daemon startup: one way to ensure that is to remove the mount.
Hence, we have two arguments for unmounting /etc/hosts in the container early after startup: i) enable /etc/hosts as a rename() target and ii) guarantee that we own and manage the file.

So, this patch upon CD daemon startup (in order)

takes note of the /etc/hosts contents
performs the unmount
restores /etc/hosts exactly as before the unmount (same content and permissions), just as a regular file

The downside is that unmount requires root privileges, and hence for now I added privileged: true to the pod spec. I don't feel great about this.

Our concern of performing atomic updates to this file in view of a mount and cross-ownership between host and container is certainly shared by many others. Some of the best discussion is in here: moby/moby#46908 (comment)

Two high-level thoughts:

I actually think we do ourselves a favor by removing FD lease-controlled /etc/hosts updates initiated from the host from our big picture. This speaks in favor of unmount.
Maybe we find a way to perform the unmount during container startup, but then can fall back to executing non-privileged.

Edit: an alternative to /etc/hosts at a different path would be great.

The C lib functions gethostbyname and getaddrinfo support HOSTALIASES as documented in https://man7.org/linux/man-pages/man7/hostname.7.html. Most notably, this files does not allow for mapping DNS names to IP addresses:

There's one more caveat: the HOSTALIASES file maps alias names to canonical host names, but the canonical name must be resolvable. You can't specify an IP address as the target.

(https://blog.tremily.us/posts/HOSTALIASES/)

Signed-off-by: Kevin Klues <[email protected]>

Without this the IMEX daemons were getting confused if the same DNS name was used to by different nodes to point to differernt IMEX daemons in the ensemble. Signed-off-by: Kevin Klues <[email protected]>

Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>

copy-pr-bot · 2025-09-04T12:43:57Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>

jgehrcke · 2025-09-06T08:20:19Z

Strongly related ref docs:

The kubelet manages the hosts file for each container of the Pod to prevent the container runtime from modifying the file after the containers have already been started. Historically, Kubernetes always used Docker Engine as its container runtime, and Docker Engine would then modify the /etc/hosts file after each container had started.

Current Kubernetes can use a variety of container runtimes; even so, the kubelet manages the hosts file within each container so that the outcome is as intended regardless of which container runtime you use.

(https://kubernetes.io/docs/tasks/network/customize-hosts-file-for-pods/#why-does-kubelet-manage-the-hosts-file)

klueska · 2025-09-07T09:30:32Z

Interesting. I suppose we could create a minimal chroot environment inside our container with our own /etc/hosts file, the binaries we want to execute and any dependent libs. Seems a bit overkill but it would probably work.

jgehrcke · 2025-09-10T15:02:21Z

minimal chroot environment inside our container with our own /etc/hosts file

Interesting idea. So, that would give us full control over /etc/hosts without having to (un)mount anything. But how can we chroot without root privileges?

klueska · 2025-09-10T15:09:26Z

you can chroot without privileges so long as the folder you are chrooting to doesnt require root. The problem is that you can't mount in the necessary /proc /sys dirs without being root.

klueska and others added 8 commits September 3, 2025 22:00

Add support for using DNS names instead of raw IPs for imex daemons

39f94dd

Signed-off-by: Kevin Klues <[email protected]>

Make maxNodesPerIMEXDomain configurable (default at 18)

29a4cd8

Signed-off-by: Kevin Klues <[email protected]>

Ensure consistent dnsname --> IMEXDaemonIP mapping on all nodes

44d2462

Without this the IMEX daemons were getting confused if the same DNS name was used to by different nodes to point to differernt IMEX daemons in the ensemble. Signed-off-by: Kevin Klues <[email protected]>

CD: dnsnames: atomically update /etc/hosts

3a723e8

Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>

CD daemon: add UnmountEtcHosts()

a7f690f

Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>

Fix typo in log msg

eb87dbf

Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>

Run CD daemon cntr privileged, to allow unmount

b448a27

Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>

Add otiai10/copy for cp-like file copy

8392151

Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>

github-project-automation bot added this to Planning Board: k8s-dra-driver-gpu Sep 4, 2025

github-project-automation bot moved this to Backlog in Planning Board: k8s-dra-driver-gpu Sep 4, 2025

go mod vendor

a6b3ce9

Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>

klueska mentioned this pull request Sep 4, 2025

Add support for using DNSNames instead of raw IPs for IMEX daemons #433

Merged

klueska added this to the v25.8.0 milestone Sep 8, 2025

klueska assigned jgehrcke Sep 8, 2025

klueska added the robustness issue/pr: edge cases & fault tolerance label Sep 11, 2025

jgehrcke mentioned this pull request Sep 15, 2025

Add anyuid SCC to compute domain service account on OpenShift #569

Open

klueska modified the milestones: v25.8.0, v25.12.0, v25.8.1 Sep 18, 2025

jgehrcke removed the status in Planning Board: k8s-dra-driver-gpu Sep 23, 2025

klueska moved this to Backlog in Planning Board: k8s-dra-driver-gpu Sep 23, 2025

klueska modified the milestones: v25.8.1, v25.12.0 Oct 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

wip: atomic updates to /etc/hosts #537

wip: atomic updates to /etc/hosts #537

Uh oh!

jgehrcke commented Sep 4, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Sep 4, 2025

Uh oh!

jgehrcke commented Sep 6, 2025

Uh oh!

klueska commented Sep 7, 2025

Uh oh!

jgehrcke commented Sep 10, 2025

Uh oh!

klueska commented Sep 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wip: atomic updates to /etc/hosts #537

Are you sure you want to change the base?

wip: atomic updates to /etc/hosts #537

Uh oh!

Conversation

jgehrcke commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Sep 4, 2025

Uh oh!

jgehrcke commented Sep 6, 2025

Uh oh!

klueska commented Sep 7, 2025

Uh oh!

jgehrcke commented Sep 10, 2025

Uh oh!

klueska commented Sep 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jgehrcke commented Sep 4, 2025 •

edited

Loading