-
Notifications
You must be signed in to change notification settings - Fork 315
Description
APC 3.11.2 and 3.13.2
I've been struggling with lockd failures when I try to do an NFS mount at boot time of my parallelcluster machines. One in a few hundred machines fails to NFS v3 mount disks....
I get the following error message in my /var/log/messages files:
Aug 25 10:31:49 ip-10-6-12-13 kernel: lockd_up: makesock failed, error=-98
Aug 25 10:31:50 ip-10-6-12-13 92_tsi_mount_disks_withdiags.sh[4215]: mount.nfs: mount(2): Address already in use
Aug 25 10:31:50 ip-10-6-12-13 92_tsi_mount_disks_withdiags.sh[4215]: mount.nfs: Address already in use
Some chef recipe that I cannot find on my AMI is forcing the lockd to use port 32768.
You can see that in the file:
cat /etc/nfs.conf
# Generated by Chef for [hostname redacted] # Local modifications will be overwritten.
#
# This is a general configuration for the
# NFS daemons and tools
[general]
pipefs-directory=/var/lib/nfs/rpc_pipefs
[gssd]
use-gss-proxy=1
[lockd]
port=32768
udp-port=32768
It is also set in the /etc/modprobe.d/options_lockd.conf and /etc/syctl.d/ in two files
99-chef-fs.nfs.nlm_tcpport.conf:fs.nfs.nlm_tcpport = 32768
99-chef-fs.nfs.nlm_udpport.conf:fs.nfs.nlm_udpport = 32768
Redhat has an article on this problem. https://access.redhat.com/solutions/2200331- it is mentioned this is for RHEL 7, so it is old, but it was updated in 2024 and smells exactly like my problem.
I have not been able to figure out what is actually locking port 32768, so I have just modified the boot up of my machine to change everything to dynamic, which is the default RHEL/Rocky experience:
# make sure lockd uses a dynamic port to avoid address collisions
[[ ! -e /etc/modprobe.d/options_lockd.conf.fcs ]] && cp /etc/modprobe.d/options_lockd.conf /etc/modprobe.d/options_lockd.conf.fcs
sudo sed -i '/^\[lockd\]/,/^$/s/^port=/#port=/' /etc/nfs.conf
sudo sed -i '/^\[lockd\]/,/^$/s/^udp-port=/#udp-port=/' /etc/nfs.conf
sudo sed -i -e 's/nlm_tcpport=[0-9]*/nlm_tcpport=0/g' -e 's/nlm_udpport=[0-9]*/nlm_udpport=0/g' /etc/modprobe.d/options_lockd.conf
# Apply sysctl changes dynamically
sudo sysctl fs.nfs.nlm_tcpport=0
sudo sysctl fs.nfs.nlm_udpport=0
modprobe -v -r lockd
modprobe -v lockd
I would ask that you remove this static assignment.
I am running with this new setup and will report back with an update if this is working.