Investigate issue #3302 driver behavior when upsd aborts#3368
Investigate issue #3302 driver behavior when upsd aborts#3368jimklimov wants to merge 8 commits intonetworkupstools:masterfrom
Conversation
|
❌ Build nut 2.8.4.4369-master failed (commit 1c5d56839b by @jimklimov) |
fd80697 to
97eca57
Compare
|
✅ Build nut 2.8.4.4370-master completed (commit 0a7e64ee19 by @jimklimov) |
…proctag() is called) [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…etworkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…rkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…gs [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…ally and flip to specified upsname later [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…sing setproctag() [networkupstools#3302, networkupstools#3368] Did not work for parallel scanning threads where it would be most useful, because they are in same process space... Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…pthreads so far [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
|
❌ Build nut 2.8.4.4371-master failed (commit 0f2f5925f5 by @jimklimov) |
|
✅ Build nut 2.8.4.4373-master completed (commit dd1c3aa017 by @jimklimov) |
|
NOTE: After #3363 it seems that UPDATE: Older Windows builds did similarly (tested with 2.8.4.1572-1572+g69e282b3b+v2.8.5+rc5 and a small swarm of 50 drivers, to be under 64 connections):
Older Linux build (2.8.4.1541.9-1550+g7cd79ab73, with 3 dummy devices from NIT):
|
…proctag() is called) [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
5ca8690 to
cf14d94
Compare
…etworkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…rkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…gs [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…proctag() is called) [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…etworkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…rkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…gs [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…ally and flip to specified upsname later [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…sing setproctag() [networkupstools#3302, networkupstools#3368] Did not work for parallel scanning threads where it would be most useful, because they are in same process space... Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…pthreads so far [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…-check before retry Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…-check before retry - also for WIN32 Also revised WaitForSingleObject() result checking - there has to be a chance to succeed ;) Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…(driver, client...) [networkupstools#3302] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…workupstools#3302, networkupstools#1711] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
11ae094 to
d0a556b
Compare
…networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…upstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…isconnect() [networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
|
Further evicted some commits grouped into side PRs (fast-merged based on successful tests in earlier iteration here), so changes proposed by this one focus on driver crashing issues again. |
…isconnect() [networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…upstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
d0a556b to
458a7bc
Compare
…upstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
458a7bc to
0ade5d4
Compare
…networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…isconnect() [networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…upstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
0ade5d4 to
7bf84e0
Compare
|
❌ Build nut 2.8.4.4379-master failed (commit 9fbb2f0a1d by @jimklimov) |
Start by poking
upsdrvcrlfor both WIN32 and POSIX builds...Includes code from PR #3367 to try reproducing the issue.
UPDATE: Maybe specific to
dummy-ups, reproduced both for standalone starts of the driver program directly, one driver viaupsdrvctl(note: the latter does not seem to propagate the exit-code and returns0, at least on Windows, probably should indicate an error), and a swarm of drivers viaupsdrvctl(also exits with code0even if all drivers died abruptly). Sometimes it took several starts ofupsdto be killed a few seconds later.In all these cases the final words were like:
upsdsometimes logs the clean-up:dummy-upsside it seems to always end with the sameentering parse_data_file()call (and exit-code 127) after failing to write to the server:I don't think I've reproduced nor ruled out the problem on non-Windows builds yet.
Per GDB and added debug-logging traces, it seems to crash around
malloc()calls, whether in PCONF context init or invupslog()a bit before it gets there: