Skip to content

Anaconda for AlmaLinux 9.2 hangs sometimes with a kickstart NFS install #178

@tsgsh

Description

@tsgsh

I have a process for building KVM VMs that uses Ansible to set up (a) a DHCP entry, (b) a tailored PXE config and AlmaLinux image, (c) a Kickstart config file pointing to an NFS share with the AlmaLinux 9.2 image and then boots the server to do the install.

This is completely reliable with Centos Stream 9. With AlmaLinux 9.2, roughly 50% of the time, it hangs during the Anaconda install shortly after the * when reporting a bug message and about 16 seconds after the NFS transfer starts. Two minutes later the NFS server closes the TCP session on port 2049 (6 FIN/ACK attempts with no response). I can send Ctrl-Alt-F4 etc to the VM and it will show me the logs but otherwise it's unresponsive. I'm not aware of a way to get the logs off the virtual /dev/tty4 except by watching them in virt-viewer but what that tells me is that Anaconda is deactivating the enp1s0 interface (which would definitely stop NFS). I have no idea why it's doing this.

The other 50% of the time it installs normally. The DHCP, TFTP and NFS servers are all the same 192.168.1.3 and that is a VM in the same subnet, running on the same host connected to the same Linux bridge (br1).

From the
/var/lib/tftpboot/pxelinux.cfg/C0A8017F

# This PXELINUX menu was created by virt_install
#
# Installs AlmaLinux 9 on agrajag at IP address=192.168.1.126 (hexadecimal=C0A8017E)
#
default vesamenu.c32
prompt 0
timeout 10
ONTIMEOUT 1
display boot.msg

menu title ## virt-install PXE Boot Menu for agrajag ##
 label 1
 menu label virt_install of AlmaLinux 9
 menu default
 kernel almalinux-9/vmlinuz
 append initrd=almalinux-9/initrd.img ip=dhcp inst.ks=nfs:192.168.1.3:/srv/shares/kickstart/agrajag.ks

/srv/shares/kickstart/agrajag.ks


# KVM/libvirt Kickstart install file for minimal server NIC enp1s0, DHCP
# dynamically created by virt-install playbook

%packages
@^minimal-environment
@headless-management
@guest-agents
@standard
@system-tools

%end

eula --agreed

# Keyboard layouts
keyboard --xlayouts='gb'
# System language
lang en_GB.UTF-8

# Network information for DHCP
network --bootproto=dhcp --device=enp1s0 --noipv6 --activate

nfs --server=192.168.1.3 --dir=/srv/shares/install_media/almalinux-9  --opts=ro,auto,soft,intr

ignoredisk --only-use=vda

clearpart --none --initlabel

partition /boot --size=1024 --fstype=xfs --ondisk=vda
partition /boot/efi --size=200 --fstype=vfat --ondisk=vda
partition pv.1 --size=1024 --grow --ondisk=vda

volgroup vg_fenchurch pv.1
logvol swap --recommended --vgname=vg_fenchurch --name=swap
logvol / --vgname=vg_fenchurch --name=root --fstype=xfs --size=1024 --grow

bootloader

# System timezone
timezone Europe/London --utc
timesource --ntp-server ntp.REDACTED.com

# Root password
rootpw --iscrypted REDACTED

# Groups
group --name=steve --gid=1000
group --name=ansible --gid=800

# Users
user --name=steve --password=REDACTED --iscrypted --uid 1000 --gid 1000 --groups=wheel --gecos="Steve Hayes"
user --name=ansible --uid 800 --gid 800 --gecos="Ansible Service Account"

# automatically reboot
reboot

%addon com_redhat_kdump --disable --reserve-mb='auto'

%end

%post

# set ssh keys
/bin/mkdir /home/steve/.ssh
/bin/chmod 700 /home/steve/.ssh
/bin/echo -e 'ssh-rsa REDACTED' >> /home/steve/.ssh/authorized_keys
![Screenshot_agrajag_2023-10-12_18:52:39](https://github.com/AlmaLinux/almalinux-deploy/assets/53232196/ae16cdf9-8454-4b54-a436-f763300bb6c7)

/bin/chmod 600 /home/steve/.ssh/authorized_keys
/bin/chown -R steve:steve /home/steve/.ssh
/bin/mkdir /home/ansible/.ssh
/bin/chmod 700 /home/ansible/.ssh
/bin/echo -e 'ssh-rsa REDACTED'  >> /home/ansible/.ssh/authorized_keys
/bin/chmod 600 /home/ansible/.ssh/authorized_keys
/bin/chown -R ansible:ansible /home/ansible/.ssh

# sudoers_entry for ansible user
/bin/echo "%ansible ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers.d/ansible-nopasswd
/bin/chmod 440 /etc/sudoers.d/ansible-nopasswd
/bin/chmod root:root /etc/sudoers.d/ansible-nopasswd

%end

Final screen from /dev/tty4 as processed (mangled) by tesseract and manually corrected as far as I can:

DEBUG NetworkManager:<debug> [1697131597.3930] platform: (enp1s0) signal: address 6 removed: fe80::5054:ff:fe01:7e00/64 lft forever pref forever lifetime 10-0[4294967295,4294967295] dev 2 flags permanent ,noprefixroute src kernel
DEBUG NetworkManager:<debug> [1697131597.9930] l3cfg[f8da0cf05cdbce1d, if index=2]: obj-state: zombie gone (untrack): [04075d756b87d07f, ip6-address, fe80::5054:ff:fe01:7e00/64 lft forever pref forever lifetime 10-0[0,0] dev 2 src ipv6ll], nm-configured, was-in-platform
DEBUG NetworkManager:<debug> [1697131597.9931] l3cfg[f8da0f05cdbce1d, if index-2]: obj-state: zombie pruned during reapply: [a7603b192df6eaf1, ip4-route, type unicast 192.168.1.0/24 dev 2 metric 100 mss 0 rt-src rt-kernel scope link pref-src 192.168.1.126],zombie[4], nm-configured, in-platform
DEBUG NetworkManager:<debug> [1697131597.9931] l3cfg[f8da0cf05cdbce1d, if index-2]: obj-state: zombie pruned during reapply: [5d233ed6cc4dbc66, ip4-route, type unicast 0.0.0.0/0 via 192.168.1.1 dev 2 metric 100 mss 0 rt-src dhcp pref-src 192.168.1.126], zombie[4], nm-configured, in-platform
DEBUG NetworkManager:<debug> [1697131597.9931] l3cfg[f8da0cf05cdbce1d, if index=2]: obj-state: zombie pruned during reapply: [0f042e168cd253c7, ip4-address, 192.168.1.126/24 brd* 192.168.1.255 lft 3591sec pref 3591sec lifetime 18-1[3600,3600] dev 2 src dhcp], zombie[4], nm-configured, in-platform
DEBUG NetworkManager:<debug> [1697131597.9931] platform: (enp1s0) address: deleting IPv4 address 192.168.1.126/24,  dev 2
DEBUG NetworkManager:<debug> [1697131597.9931] platform: (enp1s0) signal: address 4 removed: 192.168.1.126/24 brd 192.168.1.255 lft 3591sec pref 3591sec lifetime 10-1[3680,3600] dev 2 flags noprefixroute src kernel
DEBUG NetworkManager:<debug> [1697131597.9931] l3cfg[f8da0cf05cdbce1d, if index=2]: obj-state: zombie gone (untrack): [0f042e168cd253c7, ip4-address, 192.168.1.126/24 brd* 192.168.1.255 lft 3591sec pref 3591sec lifetime 10-1[3600,3600] dev 2 src dhcp], nm-configured, was-in-platform
DEBUG NetworkManager:<debug> [1697131597.9931] platform: (enp1s0) signal: route    4 removed: type local table 255 192.168.1.126/32 dev 2 metric 0 mss 0 rt-src rt-kernel scope host pref-src 192.168.1.126
DEBUG NetworkManager:<debug> [1697131597.9932] platform: (enp1s0) signal: route    4 removed: type unicast 0.0.0.0/0 via 192.168.1.1 dev 2 metric 100 mss @ rt-src rt-dhcp scope global pref-src 192.168.1.126
DEBUG NetworkManager:<debug> [1697131597.9932] l3cfg[f8da0cf05cdbce1d, if index=2]: obj-state: zombie gone (untrack): [5d233ed6cc4dbc66, ip4-route, type unicast 0.0.0.0/0 via 192.168.1.1 dev 2 metric 100 mss 0 rt-src dhcp pref-sre 192.168.1.126], nm-configured, was-in-platform
DEBUG NetworkManager:<debug> [1697131597.9932] platform: (enp1s0) signal: route    4 removed: type unicast 192.168.1.0/24 dev 2 metric 100 mss 0 rt-src rt-kernel scope link pref-src 192.168.1.126
DEBUG NetworkManager:<debug> [1697131597.9932] l3cfg[f8da0cf05cdbce1d, if index=2]: obj-state: zombie gone (untrack): [a763b192df6eaf1, ip4-route, type unicast 192.168.1.0/24 dev 2 metric 100 mss 0 rt-src rt-kernel scope link pref-src 192.168.1.126], nm-configured, was-in-platform
DEBUG NetworkManager:<debug> [1697131597 9932] platform-linux: do-delete-ip4-address[2: 192.168.1.126/24]: success
DEBUG NetworkManager:<debug> [1697131597.9932] global-tracker: sync ip4-route
DEBUG NetworkManager:<debug> [1697131597.9932] platform-linux: sysctl: setting '/proc/sys/net/ipv6/conf /enp1s0/use_tempaddr' to '0' (current value is identical)
DEBUG NetworkManager:<debug> [1697131597.9932] global-tracker: sync ip6-route
DEBUG NetworkManager:<debug> [1697131597.9932] global-tracker: sync mptcp-addr (reapply)
DEBUG NetworkManager:<debug> [1697131597.9933] global-tracker: sync routing-rule
DEBUG NetworkManager:<debug> [1697131597.9933] device[4e8776fec9c7d826] (enp1s0): set metered value 0
DEBUG NetworkManager:<debug> [1697131597.9934] manager: new metered value: 0
DEBUG NetworkManager:<debug> [1697131597.9945] active-connection[3f9dcecZ1192b449]: set state deactivated (was deactivating)
INFO NetworkManager:<info> [1697131597.99491 manager: NetworkManager state is now DISCONNECTED
WARNING org.fedoraproject.Anaconda.Modules.Network : DEBUG :anaconda modules network .network:NeworkManager state changed to <enum NM_STATE_DISCONNECTED of type NM.State>
WARNING org.fedoraproject.Anaconda.Modules Network:DEBUG:anaconda modules network .network:Connected to network: False
INFO systemd:systemd-hostnamed.service: Deactivated successfully.

Original screenshot:
Screenshot_agrajag_2023-10-12_18:52:39

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions