Skip to content

Thread count in parallel nut-scanner should scale down in case of "Too many open files" #2576

Open
@jimklimov

Description

@jimklimov

As slightly noted in issue #2575 and in PRs that dealt with parallelized scans in nut-scanner, depending on platform defaults and particular OS deployment and third-party library specifics, nut-scanner may run out of file descriptors despite already trying to adapt the maximums to ulimit information where available.

As seen recently and culminating in commit 2c3a09e of PR #2539 (issue #2511), certain libnetsnmp builds can consume FD's for network sockets, local filesystem looking for per-host configuration files or MIB files, for directory scanning during those searches, etc. This is a variable beyond our control, different implementations and versions of third-party code can behave as they please. Example staged with that commit reverted and a scan of a large network range:

...
   0.321562     [D5] nutscan_ip_ranges_iter_inc: got IP from range: 172.28.67.254
   0.321597     [D4] nutscan_scan_ip_range_snmp: max_threads_scantype=0 curr_threads=1022 thread_count=1022 stwST=-1 stwS=0 pass=1
   0.321573     [D2] Entering try_SysOID_thready for 172.28.67.253
   0.321667     [D5] nutscan_ip_ranges_iter_inc: got IP from range: 172.28.67.255
   0.321703     [D4] nutscan_scan_ip_range_snmp: max_threads_scantype=0 curr_threads=1023 thread_count=1023 stwST=-1 stwS=0 pass=1
   0.321677     [D2] Entering try_SysOID_thready for 172.28.67.254
   0.321782     [D5] nutscan_ip_ranges_iter_inc: got IP from range: 172.28.68.0
   0.321817     [D4] nutscan_scan_ip_range_snmp: max_threads_scantype=0 curr_threads=1024 thread_count=1024 stwST=-1 stwS=-1 pass=0
   0.321851     [D2] nutscan_scan_ip_range_snmp: Running too many scanning threads (1024), waiting until older ones would finish
   0.321796     [D2] Entering try_SysOID_thready for 172.28.67.255
   0.475060     [D2] Failed to open SNMP session for 172.28.67.147
/var/lib/snmp/hosts/172.28.66.252.local.conf: Too many open files
/var/lib/snmp/hosts/172.28.65.208.local.conf: Too many open files

<blocks on "too many threads" anyway, but skips a number of hosts> 

What we can do is not abort the scans upon any hiccup, but checking for errno==EMFILE and delaying and retrying later (or maybe even actively decreasing the thread maximum variable of the process). We already have a way to detect Running too many scanning threads (NUM), waiting until older ones would finish so that's about detecting the issue and extending criteria.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Low-hanging fruitA proposal or issue that is good for newcomers to codebase or otherwise a quick winenhancementneed testingCode looks reasonable, but the feature would better be tested against hardware or OSesnut-scannerportabilityWe want NUT to build and run everywhere possible

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions