Skip to content

Nomad opening a large number of file descriptors to its logging sub-processes when log collection is enabled #27847

@mtrop-godaddy

Description

@mtrop-godaddy

Nomad version

# nomad version
Nomad v1.11.3
BuildDate 2026-03-11T09:10:27Z
Revision 173ab08a0210789da531847c4ce3c3518f7fb34b+CHANGES

Operating system and Environment details

# uname -r
6.12.0-124.45.1.el10_1.x86_64

# cat /etc/os-release
NAME="AlmaLinux"
VERSION="10.1 (Heliotrope Lion)"
ID="almalinux"
ID_LIKE="rhel centos fedora"
VERSION_ID="10.1"
PLATFORM_ID="platform:el10"
[...]

# docker version
Client: Docker Engine - Community
 Version:           28.5.2
 API version:       1.51
 Go version:        go1.25.3
 Git commit:        ecc6942
 Built:             Wed Nov  5 14:46:08 2025
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          28.5.2
  API version:      1.51 (minimum version 1.24)
  Go version:       go1.25.3
  Git commit:       89c5e8f
  Built:            Wed Nov  5 14:43:03 2025
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.7.29
  GitCommit:        442cb34bda9a6a0fed82a2ca7cade05c5c749582
 runc:
  Version:          1.3.3
  GitCommit:        v1.3.3-0-gd842d771
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Issue

Not 100% sure if this is a Nomad or a go-plugin issue but posting it here for now.

If disable_log_collection is set to false and we hot-restart the Nomad agent (SIGINT + leave_on_interrupt: false) then Nomad will end up opening a large number of process handles (pidfd) to its logging sub-processes (docker_logger/logmon). They seem to eventually get cleaned up so it's not leaking them but they keep getting reopened in cycles. On clients with lots of allocations (100+) this can quickly cause Nomad to hit the open file limit, at which point it's then unable to start new allocations, collect disk stats, etc. on that client.

bpftrace shows that the pidfd_open syscalls are coming from the pidAlive function in the go-plugin library:

@[135831,
        internal/runtime/syscall.Syscall6+14
        syscall.Syscall+70
        internal/syscall/unix.PidFDOpen+37
        os.pidfdFind+52
        os.findProcess+28
        github.com/hashicorp/go-plugin/internal/cmdrunner._pidAlive+20
        github.com/hashicorp/go-plugin/internal/cmdrunner.pidWait+152
        github.com/hashicorp/go-plugin/internal/cmdrunner.(*CmdAttachedRunner).Wait+22
        github.com/hashicorp/go-plugin.(*Client).reattach.func1+138
        github.com/hashicorp/go-plugin.(*Client).reattach.gowrap1+40
        runtime.goexit.abi0+1
]: 5

# ps -ax | grep 135831
 135831 ?        Sl    27:12 /usr/bin/nomad logmon

pidAlive seems to be opening new process handles every time it's called - which seems to be once per second per allocation. It should probably be reusing the process handle instead. Relevant code:


Various traces/logs/data from an affected node:

  • Nomad client process reaches tens of thousands to >100k open FDs
# docker ps | grep /pause | wc -l
170

# sudo lsof -p "$(ps -aux | grep "[n]omad agent" | awk '{print $2}')" | wc -l
32624
  • majority are anon_inode:[pidfd]
# sudo lsof -nP -p "$(ps -aux | grep "[n]omad agent" | awk '{print $2}')" 2>/dev/null | awk 'NR>1 {print $5, $9}' | sort | uniq -c | sort -nr | head -50
   1558 unix type=STREAM
    111 a_inode [pidfd:860422]
    111 a_inode [pidfd:860335]
    111 a_inode [pidfd:706542]
    111 a_inode [pidfd:706314]
    111 a_inode [pidfd:706176]
    111 a_inode [pidfd:706065]
[...]

# cat /proc/706065/cmdline
/usr/bin/nomaddocker_logger
  • strace shows high-rate of pidfd_open(...) syscalls
sudo strace -f -tt -p "$(ps -aux | grep "[n]omad agent" | awk '{print $2}')" -e trace=pidfd_open,close -s 1000
[...]
[pid 687022] 12:52:32.592364 pidfd_open(686713, 0) = 90771
[pid 680790] 12:52:32.594956 pidfd_open(690846, 0) = 90772
[pid 683213] 12:52:32.595373 pidfd_open(123738, 0) = 90774
[pid 683213] 12:52:32.595827 pidfd_open(138604, 0) = 90775
[pid 687022] 12:52:32.596275 pidfd_open(700636, 0 <unfinished ...>
[pid 683210] 12:52:32.596367 pidfd_open(675814, 0 <unfinished ...>
[pid 683213] 12:52:32.596474 pidfd_open(173835, 0 <unfinished ...>
[pid 683210] 12:52:32.596541 <... pidfd_open resumed>) = 90777
[pid 687022] 12:52:32.596648 <... pidfd_open resumed>) = 90779
[pid 683213] 12:52:32.596761 <... pidfd_open resumed>) = 90781
[pid 683210] 12:52:32.597124 pidfd_open(637625, 0) = 90782
[pid 687022] 12:52:32.598545 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=675302, si_uid=0} ---
[pid 687022] 12:52:32.603040 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=675302, si_uid=0} ---
[pid 680790] 12:52:32.606614 pidfd_open(653779, 0) = 90783
[pid 680790] 12:52:32.608443 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=675302, si_uid=0} ---
[pid 680794] 12:52:32.608596 pidfd_open(182851, 0) = 90784
[pid 687022] 12:52:32.609003 pidfd_open(637968, 0) = 90785
[pid 683213] 12:52:32.611989 pidfd_open(662675, 0 <unfinished ...>
[pid 683210] 12:52:32.612407 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=675302, si_uid=0} ---
[pid 683213] 12:52:32.612594 <... pidfd_open resumed>) = 90788
[pid 687022] 12:52:32.612926 close(90787) = 0
[pid 683213] 12:52:32.613085 pidfd_open(680696, 0) = 90787
[pid 687022] 12:52:32.613330 pidfd_open(662669, 0) = 90789
[...]

# before raising the open file limit
[pid 518397] 13:12:04.326981 pidfd_open(864047, 0) = -1 EMFILE (Too many open files)
[pid 518397] 13:12:04.327329 pidfd_open(349068, 0) = -1 EMFILE (Too many open files)
[pid 522775] 13:12:04.329106 pidfd_open(793403, 0) = -1 EMFILE (Too many open files)
[pid 518397] 13:12:04.329752 pidfd_open(797488, 0) = -1 EMFILE (Too many open files)
[pid 522775] 13:12:04.331429 pidfd_open(349063, 0) = -1 EMFILE (Too many open files)
  • after raising RLIMIT_NOFILE, FD count jumps between 20k and 130k in cycles
  • there is no obvious allocation churn, heavy logging activity etc.
  • affected target processes are mostly /usr/bin/nomad docker_logger

Reproduction steps

  • Start with a drained client
  • Start any number of jobs on the client (ideally 100+)
  • Restart the client without draining/leaving the cluster, e.g. SIGINT + leave_on_interrupt: false
  • Check number of pidfd's (these should start increasing after Nomad reattaches to all the logging processes)
sudo lsof -p "$(ps -aux | grep "[n]omad agent" | awk '{print $2}')" | grep pidfd

Expected Result

Nomad shouldn't be running out of file-descriptors when there's ~200 allocations running on a client (and the number of open files limit is set to 65k). With log collection disabled it usually has ~3000 files open in the same setup.

Actual Result

Nomad is constantly opening new file descriptors to Docker logger processes, to check if they're still running, until it hits the open file limit, at least when there's >100 allocations running on the client.

Job file (if appropriate)

N/A

Seems to affect every job.

Nomad Server logs (if appropriate)

N/A

Nomad Client logs (if appropriate)

  • After hitting the open file limit
Apr 17 06:02:42 nomad[675302]:     2026-04-17T06:02:42.158Z [ERROR] client: error adding alloc: error="failed to initialize network configurator: failed to load CNI config: failed to detect CNI config file: open /opt/cni/config: too many open files" alloc_id=3d5409a8-f72a-9670-cc01-a470e35bb2ce

Apr 17 06:51:53 nomad[675302]:     2026-04-17T06:51:53.700Z [ERROR] client.alloc_runner: postrun failed: alloc_id=12c34024-3a67-e048-c8bb-fe0641731d60 error="post-run hook \"cpuparts_hook\" failed: cgroupslib: unable to update share cpuset with \"0-15\": open /sys/fs/cgroup/nomad.slice/share.slice/cpuset.cpus: too many open files"

Apr 17 08:01:24 nomad[675302]:     2026-04-17T08:01:24.056Z [ERROR] client.host_stats: failed to collect disk stats: error="open /proc/self/mounts: too many open files"
Apr 17 08:01:39 nomad[675302]:     2026-04-17T08:01:39.059Z [ERROR] client.host_stats: failed to collect disk stats: error="open /proc/self/mounts: too many open files"
Apr 17 08:01:54 nomad[675302]:     2026-04-17T08:01:54.062Z [ERROR] client.host_stats: failed to collect disk stats: error="open /proc/self/mounts: too many open files"
Apr 17 08:03:01 nomad[675302]:     2026-04-17T08:03:01.426Z [ERROR] agent: pipe2: too many open files
Apr 17 08:03:01 nomad[675302]:     2026-04-17T08:03:01.427Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=82b05112-8c32-cc93-af71-66fb8819d24e task=[REDACTED] type=Killing msg="Template failed: error rendering \"(dynamic)\" => \"[REDACTED]\": template render subprocess failed: pipe2: too many open files" failed=true
Apr 17 08:03:09 nomad[675302]:     2026-04-17T08:03:09.383Z [ERROR] client.host_stats: failed to collect disk stats: error="open /proc/self/mounts: too many open files"
Apr 17 08:03:19 nomad[675302]:     2026-04-17T08:03:19.794Z [INFO]  client: task exec session ended with an error: error="failed to attach to exec: cannot connect to the Docker daemon. Is 'docker daemon' running on this host?: dial unix /var/run/docker.sock: socket: too many open files" code=0xc033f91e08
Apr 17 08:03:19 nomad[675302]:     2026-04-17T08:03:19.859Z [INFO]  client: task exec session ended with an error: error="failed to attach to exec: cannot connect to the Docker daemon. Is 'docker daemon' running on this host?: dial unix /var/run/docker.sock: socket: too many open files" code=0xc034110e38
Apr 17 08:03:19 nomad[675302]:     2026-04-17T08:03:19.921Z [INFO]  client: task exec session ended with an error: error="failed to attach to exec: cannot connect to the Docker daemon. Is 'docker daemon' running on this host?: dial unix /var/run/docker.sock: socket: too many open files" code=0xc0341e4058
Apr 17 08:03:19 nomad[675302]:     2026-04-17T08:03:19.989Z [INFO]  client: task exec session ended with an error: error="failed to attach to exec: cannot connect to the Docker daemon. Is 'docker daemon' running on this host?: dial unix /var/run/docker.sock: socket: too many open files" code=0xc0341e40f8
  • On shutdown and startup
Apr 17 13:17:14 systemd[1]: nomad.service: Unit process 1024406 (nomad) remains running after unit stopped.
Apr 17 13:17:14 systemd[1]: nomad.service: Unit process 1025076 (nomad) remains running after unit stopped.
Apr 17 13:17:14 systemd[1]: nomad.service: Unit process 1025077 (nomad) remains running after unit stopped.
Apr 17 13:17:14 systemd[1]: nomad.service: Unit process 1025078 (nomad) remains running after unit stopped.
Apr 17 13:17:14 systemd[1]: nomad.service: Unit process 1025301 (nomad) remains running after unit stopped.
Apr 17 13:17:14 systemd[1]: nomad.service: Unit process 1025420 (nomad) remains running after unit stopped.
Apr 17 13:17:14 systemd[1]: nomad.service: Unit process 1025514 (nomad) remains running after unit stopped.
Apr 17 13:17:14 systemd[1]: nomad.service: Unit process 306429 (nomad) remains running after unit stopped.
Apr 17 13:17:14 systemd[1]: nomad.service: Unit process 306567 (nomad) remains running after unit stopped.

[...]

Apr 17 13:17:25 nomad[547527]: 2026-04-17T13:17:25.256Z [TRACE] plugin.stdio: waiting for stdio data
Apr 17 13:17:25 nomad[547527]: 2026-04-17T13:17:25.307Z [TRACE] plugin.stdio: waiting for stdio data
Apr 17 13:17:25 nomad[547527]: 2026-04-17T13:17:25.348Z [TRACE] plugin.stdio: waiting for stdio data
Apr 17 13:17:25 nomad[547527]: 2026-04-17T13:17:25.410Z [TRACE] plugin.stdio: waiting for stdio data
Apr 17 13:17:25 nomad[547527]: 2026-04-17T13:17:25.451Z [TRACE] plugin.stdio: waiting for stdio data
Apr 17 13:17:25 nomad[547527]: 2026-04-17T13:17:25.489Z [TRACE] plugin.stdio: waiting for stdio data

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions