Skip to content

sending job to gearmand failed since 5.1.2 (still OK in 5.1.0) #171

@pvdputte

Description

@pvdputte

Hi,

Could there be a regression between 5.1.0 and 5.1.2 in submitting jobs to gearmand?

I'm using a clean Debian 11 install with the packages from OBS (as you recommend). This is from /var/log/mod-gearman/mod_gearman_neb.log with debug=4

[...]
[2023-08-09 01:15:37][26182][TRACE] move_results_to_core()
[2023-08-09 01:15:37][26182][TRACE] handle_svc_check(6, data)
[2023-08-09 01:15:37][26182][TRACE] handle_perfdata(6)
[2023-08-09 01:15:37][26182][TRACE] handle_svc_check(6, data)
[2023-08-09 01:15:37][26182][TRACE] got target queue from service custom variable: servicegroup_somegroup
[2023-08-09 01:15:37][26182][DEBUG] received job for queue servicegroup_somegroup: some.host.tld - MEMUSED, check_options: 0   latency so far: 0.286s
[2023-08-09 01:15:37][26182][TRACE] cmd_line: /usr/local/mon/libexec/check_nrpe -H 1.2.3.4 -t "15" -p "7070"  -c check-memused -a "-" "-" "-" "95" "97"
[2023-08-09 01:15:37][26182][TRACE] add_job_to_queue(servicegroup_somegroup, F5699DEADBEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEF, 1, 1, 1, 0, 60)
[2023-08-09 01:15:37][26182][TRACE] 280 --->type=service
result_queue=check_results
host_name=some.host.tld
service_description=MEMUSED
core_time=1691536537.000288
timeout=120
command_line=/usr/local/mon/libexec/check_nrpe -H 1.2.3.4 -t "15" -p "7070"  -c check-memused -a "-" "-" "-" "95" "97"


<---
[2023-08-09 01:15:37][26182][TRACE] 384 +++>
12Py+XD5K<REDACTED>iW
<+++


[2023-08-09 01:15:42][26182][TRACE] create_client()
[2023-08-09 01:15:42][26182][INFO ] add_job_to_queue() retrying... 0
[2023-08-09 01:15:42][26182][TRACE] add_job_to_queue(servicegroup_somegroup, F5699DEADBEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEF, 1, 0, 1, 0, 60)
[2023-08-09 01:15:42][26182][TRACE] 280 --->type=service
result_queue=check_results
host_name=some.host.tld
service_description=MEMUSED
core_time=1691536537.000288
timeout=120
command_line=/usr/local/mon/libexec/check_nrpe -H 1.2.3.4 -t "15" -p "7070"  -c check-memused -a "-" "-" "-" "95" "97"


<---
[2023-08-09 01:15:42][26182][TRACE] 384 +++>
12Py+XD5K<REDACTED>iW
<+++
[2023-08-09 01:15:47][26182][ERROR] sending job to gearmand failed: gearman_wait(GEARMAN_TIMEOUT) timeout reached, 1 servers were poll(), no servers were available, pipe:false -> libgearman/universal.cc:346: pid(26182)
[2023-08-09 01:15:47][26182][TRACE] create_client()
[2023-08-09 01:15:47][26182][TRACE] add_job_to_queue() finished with errors: 0
[2023-08-09 01:15:47][26182][ERROR] failed to send service check to gearmand
[2023-08-09 01:15:47][26182][TRACE] move_results_to_core()
[2023-08-09 01:15:47][26182][TRACE] handle_progam_status_data_events(11, data)
[2023-08-09 01:15:47][26182][TRACE] log file rotated: 0
[2023-08-09 01:15:47][26182][TRACE] handle_svc_check(6, data)
[2023-08-09 01:15:47][26182][TRACE] handle_perfdata(6)
[...]

By chance I had the 5.1.0 .deb still in the /var/cache/apt of a different testing VM. (note: this deb is from the labs.consol.de repository as I just recently switched, not sure if that matters).

If I dpkg -i the 5.1.0 one without changing my conffiles, it works again:

[...]
[2023-08-09 15:30:03][36755][TRACE] move_results_to_core()
[2023-08-09 15:30:03][36755][TRACE] handle_svc_check(6, data)
[2023-08-09 15:30:03][36755][TRACE] handle_perfdata(6)
[2023-08-09 15:30:03][36755][TRACE] handle_svc_check(6, data)
[2023-08-09 15:30:03][36755][TRACE] got target queue from service custom variable: servicegroup_somegroup
[2023-08-09 15:30:03][36755][DEBUG] received job for queue servicegroup_somegroup: some.host.tld - SSH, check_options: 0   latency so far: 0.513s
[2023-08-09 15:30:03][36755][TRACE] cmd_line: /usr/local/mon/libexec/check_ssh -t "10" -p "22" 1.2.3.4
[2023-08-09 15:30:03][36755][TRACE] add_job_to_queue(servicegroup_somegroup, 0D169733BEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEFDEADBEEFE68, 1, 1, 1, 1, 60)
[2023-08-09 15:30:03][36755][TRACE] 227 --->type=service
result_queue=check_results
host_name=some.host.tld
service_description=SSH
core_time=1691587803.000045
timeout=120
command_line=/usr/local/mon/libexec/check_ssh -t "10" -p "22" 1.2.3.4


<---
[2023-08-09 15:30:03][36755][TRACE] 320 +++>
12Py+XD5Kr8<REDACTED>kPddZBa552c6cpucUT0
<+++
[2023-08-09 15:30:03][36755][INFO ] gearmand submission statistics: jobs:      3   errors:       0   submit_rate:    0.0/s   avg_submit_duration: 0.000098s   max_submit_duration: 0.000145s
[2023-08-09 15:30:03][36755][TRACE] add_job_to_queue() finished successfully
[2023-08-09 15:30:03][36755][TRACE] handle_svc_check() finished successfully
[2023-08-09 15:30:03][36755][TRACE] handle_svc_check() finished successfully -> 206
[2023-08-09 15:30:03][36755][TRACE] got result H:com-dmcore001:47
[2023-08-09 15:30:03][36755][TRACE] 492 +++>
O8eJdV+43h55e<REDACTED>zXhzaSc=
<+++
[2023-08-09 15:30:03][36755][TRACE] 368 --->
type=active
host_name=some.host.tld
service_description=SSH
core_start_time=1691587803.000045
start_time=1691587804.909941
finish_time=1691587804.964980
return_code=0
exited_ok=1
source=Mod-Gearman Worker @ some-worker.tld
output=SSH OK - OpenSSH_8.4p1 Debian-5+deb11u1 (protocol 2.0) | time=0.053162s;;;0.000000;10.000000





<---
[2023-08-09 15:30:03][36755][DEBUG] service job completed: some.host.tld SSH: exit 0, latency: 0.518, exec_time: 0.055
[2023-08-09 15:30:04][36755][TRACE] move_results_to_core()
[...]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions