Description
We pulled in a new release of the guest agent (1:20240701.00-g1
) incorporating #396 and #386 during a packer build of a new VM image.
This guest agent now writes a file /etc/netplan/20-google-guest-agent-ethernet.yaml
with the contents:
network:
version: 2
ethernets:
ens5:
match:
name: ens5
mtu: 1460
dhcp4: true
dhcp4-overrides:
use-domains: true
vs. the previous default /etc/netplan/90-default.yaml
network:
version: 2
ethernets:
all-en:
match:
name: en*
dhcp4: true
dhcp4-overrides:
use-domains: true
dhcp6: true
dhcp6-overrides:
use-domains: true
all-eth:
match:
name: eth*
dhcp4: true
dhcp4-overrides:
use-domains: true
dhcp6: true
dhcp6-overrides:
use-domains: true
the interface on the build instance is ens4
and the 20-google-guest-agent-ethernet.yaml
file hardcodes that interface name into the image.
When the Image is run on a new VM, if that VM has a different network interface name (eg: we're seeing ens5
on some VMs), the network interface fails to come up since the declaration in the condig file is missing. This effectively breaks networking on the box as the ens5
interface is never brought up because /run/systemd/network/10-netplan-all-en.network
is missing.
- we confirmed this by upgrading the guest agent on a running VM and observing that
run/systemd/network/10-netplan-all-en.network
and/etc/netplan/90-default.yaml
is missing post upgrade. - we see no evidence to indicate network device naming is predictable/persistent between reboots; a VM coming up with a different network interface name post a reboot will now not bring up networking due to this change
- our workflow for creating custom machine images is now broken as the builder machine has different network interface names (
ens4
) to the VMs on our managed instance group (which come up withens5
).
We're running the debian-cloud/debian-12
image with:
netplan.io 0.106-2+deb12u1
systemd 252.26-1~deb12u2
google-guest-agent 1:20240701.00-g1
Post reboot; the guest agent is crashing because it can't reach the metadata API (since ens5 is not up), so it wouldn't be able to presumably re-generate the config for the new interface name.
2024-07-16T03:39:09.740174+00:00 packer-6695cb52-72cf-9b08-c0c0-dcfffc97fcf8 google_guest_agent[1121]: ERROR instance_setup.go:159 Failed to reach MDS(all retries exhausted): exhausted all (100) retries, last error: request failed with status code: [-1], error: [error connecting to metadata server: Get "http://169.254.169.254/computeMetadata/v1/?alt=json&recursive=true&timeout_sec=60": dial tcp 169.254.169.254:80: connect: network is unreachable]
2024-07-16T03:39:09.740991+00:00 packer-6695cb52-72cf-9b08-c0c0-dcfffc97fcf8 systemd[1]: google-guest-agent.service: Main process exited, code=exited, status=1/FAILURE
2024-07-16T03:39:09.741072+00:00 packer-6695cb52-72cf-9b08-c0c0-dcfffc97fcf8 systemd[1]: google-guest-agent.service: Failed with result 'exit-code'.
Activity