Skip to content

google-guest-agent after 20240701.00 persists a file that locks systemd-networkd to a specific interface device name #401

Open
@char8

Description

We pulled in a new release of the guest agent (1:20240701.00-g1) incorporating #396 and #386 during a packer build of a new VM image.

This guest agent now writes a file /etc/netplan/20-google-guest-agent-ethernet.yaml with the contents:

network:
    version: 2
    ethernets:
        ens5:
            match:
                name: ens5
            mtu: 1460
            dhcp4: true
            dhcp4-overrides:
                use-domains: true

vs. the previous default /etc/netplan/90-default.yaml

network:
    version: 2
    ethernets:
        all-en:
            match:
                name: en*
            dhcp4: true
            dhcp4-overrides:
                use-domains: true
            dhcp6: true
            dhcp6-overrides:
                use-domains: true
        all-eth:
            match:
                name: eth*
            dhcp4: true
            dhcp4-overrides:
                use-domains: true
            dhcp6: true
            dhcp6-overrides:
                use-domains: true

the interface on the build instance is ens4 and the 20-google-guest-agent-ethernet.yaml file hardcodes that interface name into the image.

When the Image is run on a new VM, if that VM has a different network interface name (eg: we're seeing ens5 on some VMs), the network interface fails to come up since the declaration in the condig file is missing. This effectively breaks networking on the box as the ens5 interface is never brought up because /run/systemd/network/10-netplan-all-en.network is missing.

  • we confirmed this by upgrading the guest agent on a running VM and observing that run/systemd/network/10-netplan-all-en.network and /etc/netplan/90-default.yaml is missing post upgrade.
  • we see no evidence to indicate network device naming is predictable/persistent between reboots; a VM coming up with a different network interface name post a reboot will now not bring up networking due to this change
  • our workflow for creating custom machine images is now broken as the builder machine has different network interface names (ens4) to the VMs on our managed instance group (which come up with ens5).

We're running the debian-cloud/debian-12 image with:

netplan.io                            0.106-2+deb12u1
systemd                               252.26-1~deb12u2
google-guest-agent           1:20240701.00-g1

Post reboot; the guest agent is crashing because it can't reach the metadata API (since ens5 is not up), so it wouldn't be able to presumably re-generate the config for the new interface name.

2024-07-16T03:39:09.740174+00:00 packer-6695cb52-72cf-9b08-c0c0-dcfffc97fcf8 google_guest_agent[1121]: ERROR instance_setup.go:159 Failed to reach MDS(all retries exhausted): exhausted all (100) retries, last error: request failed with status code: [-1], error: [error connecting to metadata server: Get "http://169.254.169.254/computeMetadata/v1/?alt=json&recursive=true&timeout_sec=60": dial tcp 169.254.169.254:80: connect: network is unreachable]
2024-07-16T03:39:09.740991+00:00 packer-6695cb52-72cf-9b08-c0c0-dcfffc97fcf8 systemd[1]: google-guest-agent.service: Main process exited, code=exited, status=1/FAILURE
2024-07-16T03:39:09.741072+00:00 packer-6695cb52-72cf-9b08-c0c0-dcfffc97fcf8 systemd[1]: google-guest-agent.service: Failed with result 'exit-code'.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions