Skip to content

kicbase: fix deterministic machine-id breaking MAC addresses in multi-node Podman rootless clusters#22823

Open
RobinMcCorkell wants to merge 1 commit intokubernetes:masterfrom
RobinMcCorkell:fix/kicbase-machine-id
Open

kicbase: fix deterministic machine-id breaking MAC addresses in multi-node Podman rootless clusters#22823
RobinMcCorkell wants to merge 1 commit intokubernetes:masterfrom
RobinMcCorkell:fix/kicbase-machine-id

Conversation

@RobinMcCorkell
Copy link
Copy Markdown

@RobinMcCorkell RobinMcCorkell commented Apr 10, 2026

Problem

/var/lib/dbus/machine-id is baked into the kicbase container image at build time. When the entrypoint's fix_machine_id runs systemd-machine-id-setup, it finds that file and derives /etc/machine-id from it — producing the same machine ID in every container.

This breaks anything that depends on the machine ID being unique per node. The most visible symptom is in multi-node minikube clusters using Podman rootless mode: a veth interface is placed into each Podman container, and systemd configures it according to MACAddressPolicy=persistent. That policy selects a MAC address based on the machine ID (via systemd-machine-id-setup, which reads from the D-Bus machine ID: https://www.freedesktop.org/software/systemd/man/latest/systemd-machine-id-setup.html). Because all containers share the same machine ID derived from the baked-in D-Bus ID, all nodes get identical MAC addresses on eth0, causing network failures.

Fix

Per https://systemd.io/CONTAINER_INTERFACE/, add a RUN step to the Dockerfile (before the final squash) that:

  • Truncates /etc/machine-id to an empty file — the spec requires the file to be present but uninitialized so systemd can fill it on boot.
  • Deletes /var/lib/dbus/machine-id — removes the baked-in D-Bus ID that was the source of the deterministic machine ID.

With these changes, systemd generates a fresh random machine ID on every container boot without any entrypoint assistance. This makes the existing fix_machine_id function in the entrypoint redundant — it has been removed.

Testing

Tested against the real kicbase image (gcr.io/k8s-minikube/kicbase-builds:v0.0.50-1772266598-22719) with Podman rootless.

Reproducing the bug

Both files contain the same baked-in value in the current image:

$ KICBASE=gcr.io/k8s-minikube/kicbase-builds:v0.0.50-1772266598-22719
$ podman run --rm --entrypoint bash $KICBASE -c '
    echo "/etc/machine-id:          $(cat /etc/machine-id)"
    echo "/var/lib/dbus/machine-id: $(cat /var/lib/dbus/machine-id)"
  '
/etc/machine-id:          636135c8833b3f1430bfc1ec69a2a4d1
/var/lib/dbus/machine-id: 636135c8833b3f1430bfc1ec69a2a4d1

Running systemd-machine-id-setup (as fix_machine_id does) produces the same ID every time:

$ for i in 1 2 3; do
    podman run --rm --entrypoint bash $KICBASE -c '
      rm -f /etc/machine-id; systemd-machine-id-setup 2>/dev/null; cat /etc/machine-id'
  done
636135c8833b3f1430bfc1ec69a2a4d1
636135c8833b3f1430bfc1ec69a2a4d1
636135c8833b3f1430bfc1ec69a2a4d1

Verifying the fix

Build a patched image applying the fix on top of the current kicbase:

$ podman build -t kicbase-patched - <<EOF
FROM $KICBASE
RUN truncate -s 0 /etc/machine-id && rm -f /var/lib/dbus/machine-id
EOF

Confirm the image state:

$ podman run --rm --entrypoint bash kicbase-patched -c '
    echo "/etc/machine-id size: $(wc -c < /etc/machine-id) bytes"
    ls /var/lib/dbus/machine-id 2>/dev/null || echo "/var/lib/dbus/machine-id: (removed)"
  '
/etc/machine-id size: 0 bytes
/var/lib/dbus/machine-id: (removed)

Each container now gets a unique machine ID:

$ for i in 1 2 3; do
    podman run --rm --entrypoint bash kicbase-patched -c '
      systemd-machine-id-setup 2>/dev/null; cat /etc/machine-id'
  done
4c971ecb6dc34159ad1da13a867a51c3
2d717cadb4804b1b9863f7b8b2da998d
b25726fead9246f691e97b3a63834de2

…-node Podman rootless clusters

/var/lib/dbus/machine-id was baked into the kicbase container image at
build time. When fix_machine_id in the entrypoint ran
systemd-machine-id-setup, it found that file and derived /etc/machine-id
from it — producing the same machine ID in every container.

This breaks anything that depends on the machine ID being unique per
node. The most visible symptom in multi-node minikube clusters using
Podman rootless mode: a veth interface is placed into each Podman
container, and systemd configures it according to
MACAddressPolicy=persistent. That policy derives the MAC address from
the machine ID (systemd-machine-id-setup reads from the D-Bus machine
ID: https://www.freedesktop.org/software/systemd/man/latest/systemd-machine-id-setup.html).
With every container sharing the same machine ID, all nodes get
identical MAC addresses on eth0, causing network failures.

Fix: Dockerfile only (entrypoint fix_machine_id is no longer needed)

Per https://systemd.io/CONTAINER_INTERFACE/, add a RUN step that:
- truncates /etc/machine-id to an empty file: the spec requires this
  file to be present but uninitialized so systemd can fill it on boot.
- deletes /var/lib/dbus/machine-id: removes the baked-in D-Bus ID that
  was the source of the deterministic (and shared) machine ID.

With these changes, systemd generates a fresh random machine ID on
every container boot without any entrypoint assistance, making
fix_machine_id in the entrypoint redundant. It has been removed.

Tested with Podman rootless using debian:bookworm-slim + systemd:
- Before: all runs produce aabbccddeeff00112233445566778899 (the
  baked-in D-Bus ID)
- After: each run produces a unique random ID

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: RobinMcCorkell
Once this PR has been reviewed and has the lgtm label, please assign comradeprogrammer for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@linux-foundation-easycla
Copy link
Copy Markdown

CLA Not Signed

@k8s-ci-robot k8s-ci-robot requested a review from prezha April 10, 2026 20:26
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @RobinMcCorkell!

It looks like this is your first PR to kubernetes/minikube 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/minikube has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 10, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @RobinMcCorkell. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 10, 2026
@minikube-bot
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?

@RobinMcCorkell
Copy link
Copy Markdown
Author

The CLA signing page (and the support link) take me to a blank page on https://sso.linuxfoundation.org/login, looks like there's some CORS issues preventing the page from loading. I can't sign the CLA without that.

Also I don't think I can sign the CLA on behalf of Copilot 🙈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants