-
Notifications
You must be signed in to change notification settings - Fork 82
Description
I have a process for building KVM VMs that uses Ansible to set up (a) a DHCP entry, (b) a tailored PXE config and AlmaLinux image, (c) a Kickstart config file pointing to an NFS share with the AlmaLinux 9.2 image and then boots the server to do the install.
This is completely reliable with Centos Stream 9. With AlmaLinux 9.2, roughly 50% of the time, it hangs during the Anaconda install shortly after the * when reporting a bug message and about 16 seconds after the NFS transfer starts. Two minutes later the NFS server closes the TCP session on port 2049 (6 FIN/ACK attempts with no response). I can send Ctrl-Alt-F4 etc to the VM and it will show me the logs but otherwise it's unresponsive. I'm not aware of a way to get the logs off the virtual /dev/tty4 except by watching them in virt-viewer but what that tells me is that Anaconda is deactivating the enp1s0 interface (which would definitely stop NFS). I have no idea why it's doing this.
The other 50% of the time it installs normally. The DHCP, TFTP and NFS servers are all the same 192.168.1.3 and that is a VM in the same subnet, running on the same host connected to the same Linux bridge (br1).
From the
/var/lib/tftpboot/pxelinux.cfg/C0A8017F
# This PXELINUX menu was created by virt_install
#
# Installs AlmaLinux 9 on agrajag at IP address=192.168.1.126 (hexadecimal=C0A8017E)
#
default vesamenu.c32
prompt 0
timeout 10
ONTIMEOUT 1
display boot.msg
menu title ## virt-install PXE Boot Menu for agrajag ##
label 1
menu label virt_install of AlmaLinux 9
menu default
kernel almalinux-9/vmlinuz
append initrd=almalinux-9/initrd.img ip=dhcp inst.ks=nfs:192.168.1.3:/srv/shares/kickstart/agrajag.ks
/srv/shares/kickstart/agrajag.ks
# KVM/libvirt Kickstart install file for minimal server NIC enp1s0, DHCP
# dynamically created by virt-install playbook
%packages
@^minimal-environment
@headless-management
@guest-agents
@standard
@system-tools
%end
eula --agreed
# Keyboard layouts
keyboard --xlayouts='gb'
# System language
lang en_GB.UTF-8
# Network information for DHCP
network --bootproto=dhcp --device=enp1s0 --noipv6 --activate
nfs --server=192.168.1.3 --dir=/srv/shares/install_media/almalinux-9 --opts=ro,auto,soft,intr
ignoredisk --only-use=vda
clearpart --none --initlabel
partition /boot --size=1024 --fstype=xfs --ondisk=vda
partition /boot/efi --size=200 --fstype=vfat --ondisk=vda
partition pv.1 --size=1024 --grow --ondisk=vda
volgroup vg_fenchurch pv.1
logvol swap --recommended --vgname=vg_fenchurch --name=swap
logvol / --vgname=vg_fenchurch --name=root --fstype=xfs --size=1024 --grow
bootloader
# System timezone
timezone Europe/London --utc
timesource --ntp-server ntp.REDACTED.com
# Root password
rootpw --iscrypted REDACTED
# Groups
group --name=steve --gid=1000
group --name=ansible --gid=800
# Users
user --name=steve --password=REDACTED --iscrypted --uid 1000 --gid 1000 --groups=wheel --gecos="Steve Hayes"
user --name=ansible --uid 800 --gid 800 --gecos="Ansible Service Account"
# automatically reboot
reboot
%addon com_redhat_kdump --disable --reserve-mb='auto'
%end
%post
# set ssh keys
/bin/mkdir /home/steve/.ssh
/bin/chmod 700 /home/steve/.ssh
/bin/echo -e 'ssh-rsa REDACTED' >> /home/steve/.ssh/authorized_keys

/bin/chmod 600 /home/steve/.ssh/authorized_keys
/bin/chown -R steve:steve /home/steve/.ssh
/bin/mkdir /home/ansible/.ssh
/bin/chmod 700 /home/ansible/.ssh
/bin/echo -e 'ssh-rsa REDACTED' >> /home/ansible/.ssh/authorized_keys
/bin/chmod 600 /home/ansible/.ssh/authorized_keys
/bin/chown -R ansible:ansible /home/ansible/.ssh
# sudoers_entry for ansible user
/bin/echo "%ansible ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers.d/ansible-nopasswd
/bin/chmod 440 /etc/sudoers.d/ansible-nopasswd
/bin/chmod root:root /etc/sudoers.d/ansible-nopasswd
%end
Final screen from /dev/tty4 as processed (mangled) by tesseract and manually corrected as far as I can:
DEBUG NetworkManager:<debug> [1697131597.3930] platform: (enp1s0) signal: address 6 removed: fe80::5054:ff:fe01:7e00/64 lft forever pref forever lifetime 10-0[4294967295,4294967295] dev 2 flags permanent ,noprefixroute src kernel
DEBUG NetworkManager:<debug> [1697131597.9930] l3cfg[f8da0cf05cdbce1d, if index=2]: obj-state: zombie gone (untrack): [04075d756b87d07f, ip6-address, fe80::5054:ff:fe01:7e00/64 lft forever pref forever lifetime 10-0[0,0] dev 2 src ipv6ll], nm-configured, was-in-platform
DEBUG NetworkManager:<debug> [1697131597.9931] l3cfg[f8da0f05cdbce1d, if index-2]: obj-state: zombie pruned during reapply: [a7603b192df6eaf1, ip4-route, type unicast 192.168.1.0/24 dev 2 metric 100 mss 0 rt-src rt-kernel scope link pref-src 192.168.1.126],zombie[4], nm-configured, in-platform
DEBUG NetworkManager:<debug> [1697131597.9931] l3cfg[f8da0cf05cdbce1d, if index-2]: obj-state: zombie pruned during reapply: [5d233ed6cc4dbc66, ip4-route, type unicast 0.0.0.0/0 via 192.168.1.1 dev 2 metric 100 mss 0 rt-src dhcp pref-src 192.168.1.126], zombie[4], nm-configured, in-platform
DEBUG NetworkManager:<debug> [1697131597.9931] l3cfg[f8da0cf05cdbce1d, if index=2]: obj-state: zombie pruned during reapply: [0f042e168cd253c7, ip4-address, 192.168.1.126/24 brd* 192.168.1.255 lft 3591sec pref 3591sec lifetime 18-1[3600,3600] dev 2 src dhcp], zombie[4], nm-configured, in-platform
DEBUG NetworkManager:<debug> [1697131597.9931] platform: (enp1s0) address: deleting IPv4 address 192.168.1.126/24, dev 2
DEBUG NetworkManager:<debug> [1697131597.9931] platform: (enp1s0) signal: address 4 removed: 192.168.1.126/24 brd 192.168.1.255 lft 3591sec pref 3591sec lifetime 10-1[3680,3600] dev 2 flags noprefixroute src kernel
DEBUG NetworkManager:<debug> [1697131597.9931] l3cfg[f8da0cf05cdbce1d, if index=2]: obj-state: zombie gone (untrack): [0f042e168cd253c7, ip4-address, 192.168.1.126/24 brd* 192.168.1.255 lft 3591sec pref 3591sec lifetime 10-1[3600,3600] dev 2 src dhcp], nm-configured, was-in-platform
DEBUG NetworkManager:<debug> [1697131597.9931] platform: (enp1s0) signal: route 4 removed: type local table 255 192.168.1.126/32 dev 2 metric 0 mss 0 rt-src rt-kernel scope host pref-src 192.168.1.126
DEBUG NetworkManager:<debug> [1697131597.9932] platform: (enp1s0) signal: route 4 removed: type unicast 0.0.0.0/0 via 192.168.1.1 dev 2 metric 100 mss @ rt-src rt-dhcp scope global pref-src 192.168.1.126
DEBUG NetworkManager:<debug> [1697131597.9932] l3cfg[f8da0cf05cdbce1d, if index=2]: obj-state: zombie gone (untrack): [5d233ed6cc4dbc66, ip4-route, type unicast 0.0.0.0/0 via 192.168.1.1 dev 2 metric 100 mss 0 rt-src dhcp pref-sre 192.168.1.126], nm-configured, was-in-platform
DEBUG NetworkManager:<debug> [1697131597.9932] platform: (enp1s0) signal: route 4 removed: type unicast 192.168.1.0/24 dev 2 metric 100 mss 0 rt-src rt-kernel scope link pref-src 192.168.1.126
DEBUG NetworkManager:<debug> [1697131597.9932] l3cfg[f8da0cf05cdbce1d, if index=2]: obj-state: zombie gone (untrack): [a763b192df6eaf1, ip4-route, type unicast 192.168.1.0/24 dev 2 metric 100 mss 0 rt-src rt-kernel scope link pref-src 192.168.1.126], nm-configured, was-in-platform
DEBUG NetworkManager:<debug> [1697131597 9932] platform-linux: do-delete-ip4-address[2: 192.168.1.126/24]: success
DEBUG NetworkManager:<debug> [1697131597.9932] global-tracker: sync ip4-route
DEBUG NetworkManager:<debug> [1697131597.9932] platform-linux: sysctl: setting '/proc/sys/net/ipv6/conf /enp1s0/use_tempaddr' to '0' (current value is identical)
DEBUG NetworkManager:<debug> [1697131597.9932] global-tracker: sync ip6-route
DEBUG NetworkManager:<debug> [1697131597.9932] global-tracker: sync mptcp-addr (reapply)
DEBUG NetworkManager:<debug> [1697131597.9933] global-tracker: sync routing-rule
DEBUG NetworkManager:<debug> [1697131597.9933] device[4e8776fec9c7d826] (enp1s0): set metered value 0
DEBUG NetworkManager:<debug> [1697131597.9934] manager: new metered value: 0
DEBUG NetworkManager:<debug> [1697131597.9945] active-connection[3f9dcecZ1192b449]: set state deactivated (was deactivating)
INFO NetworkManager:<info> [1697131597.99491 manager: NetworkManager state is now DISCONNECTED
WARNING org.fedoraproject.Anaconda.Modules.Network : DEBUG :anaconda modules network .network:NeworkManager state changed to <enum NM_STATE_DISCONNECTED of type NM.State>
WARNING org.fedoraproject.Anaconda.Modules Network:DEBUG:anaconda modules network .network:Connected to network: False
INFO systemd:systemd-hostnamed.service: Deactivated successfully.
