Hi,
Could there be a regression between 5.1.0 and 5.1.2 in submitting jobs to gearmand?
I'm using a clean Debian 11 install with the packages from OBS (as you recommend). This is from /var/log/mod-gearman/mod_gearman_neb.log with debug=4
[...]
[2023-08-09 01:15:37][26182][TRACE] move_results_to_core()
[2023-08-09 01:15:37][26182][TRACE] handle_svc_check(6, data)
[2023-08-09 01:15:37][26182][TRACE] handle_perfdata(6)
[2023-08-09 01:15:37][26182][TRACE] handle_svc_check(6, data)
[2023-08-09 01:15:37][26182][TRACE] got target queue from service custom variable: servicegroup_somegroup
[2023-08-09 01:15:37][26182][DEBUG] received job for queue servicegroup_somegroup: some.host.tld - MEMUSED, check_options: 0 latency so far: 0.286s
[2023-08-09 01:15:37][26182][TRACE] cmd_line: /usr/local/mon/libexec/check_nrpe -H 1.2.3.4 -t "15" -p "7070" -c check-memused -a "-" "-" "-" "95" "97"
[2023-08-09 01:15:37][26182][TRACE] add_job_to_queue(servicegroup_somegroup, F5699DEADBEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEF, 1, 1, 1, 0, 60)
[2023-08-09 01:15:37][26182][TRACE] 280 --->type=service
result_queue=check_results
host_name=some.host.tld
service_description=MEMUSED
core_time=1691536537.000288
timeout=120
command_line=/usr/local/mon/libexec/check_nrpe -H 1.2.3.4 -t "15" -p "7070" -c check-memused -a "-" "-" "-" "95" "97"
<---
[2023-08-09 01:15:37][26182][TRACE] 384 +++>
12Py+XD5K<REDACTED>iW
<+++
[2023-08-09 01:15:42][26182][TRACE] create_client()
[2023-08-09 01:15:42][26182][INFO ] add_job_to_queue() retrying... 0
[2023-08-09 01:15:42][26182][TRACE] add_job_to_queue(servicegroup_somegroup, F5699DEADBEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEF, 1, 0, 1, 0, 60)
[2023-08-09 01:15:42][26182][TRACE] 280 --->type=service
result_queue=check_results
host_name=some.host.tld
service_description=MEMUSED
core_time=1691536537.000288
timeout=120
command_line=/usr/local/mon/libexec/check_nrpe -H 1.2.3.4 -t "15" -p "7070" -c check-memused -a "-" "-" "-" "95" "97"
<---
[2023-08-09 01:15:42][26182][TRACE] 384 +++>
12Py+XD5K<REDACTED>iW
<+++
[2023-08-09 01:15:47][26182][ERROR] sending job to gearmand failed: gearman_wait(GEARMAN_TIMEOUT) timeout reached, 1 servers were poll(), no servers were available, pipe:false -> libgearman/universal.cc:346: pid(26182)
[2023-08-09 01:15:47][26182][TRACE] create_client()
[2023-08-09 01:15:47][26182][TRACE] add_job_to_queue() finished with errors: 0
[2023-08-09 01:15:47][26182][ERROR] failed to send service check to gearmand
[2023-08-09 01:15:47][26182][TRACE] move_results_to_core()
[2023-08-09 01:15:47][26182][TRACE] handle_progam_status_data_events(11, data)
[2023-08-09 01:15:47][26182][TRACE] log file rotated: 0
[2023-08-09 01:15:47][26182][TRACE] handle_svc_check(6, data)
[2023-08-09 01:15:47][26182][TRACE] handle_perfdata(6)
[...]
By chance I had the 5.1.0 .deb still in the /var/cache/apt of a different testing VM. (note: this deb is from the labs.consol.de repository as I just recently switched, not sure if that matters).
If I dpkg -i the 5.1.0 one without changing my conffiles, it works again:
[...]
[2023-08-09 15:30:03][36755][TRACE] move_results_to_core()
[2023-08-09 15:30:03][36755][TRACE] handle_svc_check(6, data)
[2023-08-09 15:30:03][36755][TRACE] handle_perfdata(6)
[2023-08-09 15:30:03][36755][TRACE] handle_svc_check(6, data)
[2023-08-09 15:30:03][36755][TRACE] got target queue from service custom variable: servicegroup_somegroup
[2023-08-09 15:30:03][36755][DEBUG] received job for queue servicegroup_somegroup: some.host.tld - SSH, check_options: 0 latency so far: 0.513s
[2023-08-09 15:30:03][36755][TRACE] cmd_line: /usr/local/mon/libexec/check_ssh -t "10" -p "22" 1.2.3.4
[2023-08-09 15:30:03][36755][TRACE] add_job_to_queue(servicegroup_somegroup, 0D169733BEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEFE68, 1, 1, 1, 1, 60)
[2023-08-09 15:30:03][36755][TRACE] 227 --->type=service
result_queue=check_results
host_name=some.host.tld
service_description=SSH
core_time=1691587803.000045
timeout=120
command_line=/usr/local/mon/libexec/check_ssh -t "10" -p "22" 1.2.3.4
<---
[2023-08-09 15:30:03][36755][TRACE] 320 +++>
12Py+XD5Kr8<REDACTED>kPddZBa552c6cpucUT0
<+++
[2023-08-09 15:30:03][36755][INFO ] gearmand submission statistics: jobs: 3 errors: 0 submit_rate: 0.0/s avg_submit_duration: 0.000098s max_submit_duration: 0.000145s
[2023-08-09 15:30:03][36755][TRACE] add_job_to_queue() finished successfully
[2023-08-09 15:30:03][36755][TRACE] handle_svc_check() finished successfully
[2023-08-09 15:30:03][36755][TRACE] handle_svc_check() finished successfully -> 206
[2023-08-09 15:30:03][36755][TRACE] got result H:com-dmcore001:47
[2023-08-09 15:30:03][36755][TRACE] 492 +++>
O8eJdV+43h55e<REDACTED>zXhzaSc=
<+++
[2023-08-09 15:30:03][36755][TRACE] 368 --->
type=active
host_name=some.host.tld
service_description=SSH
core_start_time=1691587803.000045
start_time=1691587804.909941
finish_time=1691587804.964980
return_code=0
exited_ok=1
source=Mod-Gearman Worker @ some-worker.tld
output=SSH OK - OpenSSH_8.4p1 Debian-5+deb11u1 (protocol 2.0) | time=0.053162s;;;0.000000;10.000000
<---
[2023-08-09 15:30:03][36755][DEBUG] service job completed: some.host.tld SSH: exit 0, latency: 0.518, exec_time: 0.055
[2023-08-09 15:30:04][36755][TRACE] move_results_to_core()
[...]
Hi,
Could there be a regression between 5.1.0 and 5.1.2 in submitting jobs to gearmand?
I'm using a clean Debian 11 install with the packages from OBS (as you recommend). This is from
/var/log/mod-gearman/mod_gearman_neb.logwith debug=4By chance I had the 5.1.0 .deb still in the
/var/cache/aptof a different testing VM. (note: this deb is from the labs.consol.de repository as I just recently switched, not sure if that matters).If I
dpkg -ithe 5.1.0 one without changing my conffiles, it works again: