kicbase: fix deterministic machine-id breaking MAC addresses in multi-node Podman rootless clusters#22823
Conversation
…-node Podman rootless clusters /var/lib/dbus/machine-id was baked into the kicbase container image at build time. When fix_machine_id in the entrypoint ran systemd-machine-id-setup, it found that file and derived /etc/machine-id from it — producing the same machine ID in every container. This breaks anything that depends on the machine ID being unique per node. The most visible symptom in multi-node minikube clusters using Podman rootless mode: a veth interface is placed into each Podman container, and systemd configures it according to MACAddressPolicy=persistent. That policy derives the MAC address from the machine ID (systemd-machine-id-setup reads from the D-Bus machine ID: https://www.freedesktop.org/software/systemd/man/latest/systemd-machine-id-setup.html). With every container sharing the same machine ID, all nodes get identical MAC addresses on eth0, causing network failures. Fix: Dockerfile only (entrypoint fix_machine_id is no longer needed) Per https://systemd.io/CONTAINER_INTERFACE/, add a RUN step that: - truncates /etc/machine-id to an empty file: the spec requires this file to be present but uninitialized so systemd can fill it on boot. - deletes /var/lib/dbus/machine-id: removes the baked-in D-Bus ID that was the source of the deterministic (and shared) machine ID. With these changes, systemd generates a fresh random machine ID on every container boot without any entrypoint assistance, making fix_machine_id in the entrypoint redundant. It has been removed. Tested with Podman rootless using debian:bookworm-slim + systemd: - Before: all runs produce aabbccddeeff00112233445566778899 (the baked-in D-Bus ID) - After: each run produces a unique random ID Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: RobinMcCorkell The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
|
Welcome @RobinMcCorkell! |
|
Hi @RobinMcCorkell. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Can one of the admins verify this patch? |
|
The CLA signing page (and the support link) take me to a blank page on https://sso.linuxfoundation.org/login, looks like there's some CORS issues preventing the page from loading. I can't sign the CLA without that. Also I don't think I can sign the CLA on behalf of Copilot 🙈 |
Problem
/var/lib/dbus/machine-idis baked into the kicbase container image at build time. When the entrypoint'sfix_machine_idrunssystemd-machine-id-setup, it finds that file and derives/etc/machine-idfrom it — producing the same machine ID in every container.This breaks anything that depends on the machine ID being unique per node. The most visible symptom is in multi-node minikube clusters using Podman rootless mode: a
vethinterface is placed into each Podman container, and systemd configures it according toMACAddressPolicy=persistent. That policy selects a MAC address based on the machine ID (viasystemd-machine-id-setup, which reads from the D-Bus machine ID: https://www.freedesktop.org/software/systemd/man/latest/systemd-machine-id-setup.html). Because all containers share the same machine ID derived from the baked-in D-Bus ID, all nodes get identical MAC addresses oneth0, causing network failures.Fix
Per https://systemd.io/CONTAINER_INTERFACE/, add a
RUNstep to the Dockerfile (before the final squash) that:/etc/machine-idto an empty file — the spec requires the file to be present but uninitialized so systemd can fill it on boot./var/lib/dbus/machine-id— removes the baked-in D-Bus ID that was the source of the deterministic machine ID.With these changes, systemd generates a fresh random machine ID on every container boot without any entrypoint assistance. This makes the existing
fix_machine_idfunction in the entrypoint redundant — it has been removed.Testing
Tested against the real kicbase image (
gcr.io/k8s-minikube/kicbase-builds:v0.0.50-1772266598-22719) with Podman rootless.Reproducing the bug
Both files contain the same baked-in value in the current image:
Running
systemd-machine-id-setup(asfix_machine_iddoes) produces the same ID every time:Verifying the fix
Build a patched image applying the fix on top of the current kicbase:
Confirm the image state:
Each container now gets a unique machine ID: