Skip to content
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 33 additions & 2 deletions lib/Thruk/Controller/extinfo.pm
Original file line number Diff line number Diff line change
Expand Up @@ -1050,17 +1050,48 @@ sub _check_stale_check {

return(0) unless $obj->{'in_check_period'};

# do dependencies exist
return(0) unless(scalar @{$obj->{'depends_exec'}||[]} > 0 || scalar @{$obj->{'parents'}||[]} > 0);
# stalement requirement:
# from the last check time, a next check is scheduled
# if from that next check, hypotetical second next check is scheduled as well
# if the second next check lies in the past, the service is marked stale, as it missed two planned checks

my $peer_key = $obj->{'peer_key'};
my $check_interval = $obj->{'check_interval'} * $c->stash->{'pi_detail'}->{$peer_key}->{'interval_length'};
my $retry_interval = $obj->{'retry_interval'} * $c->stash->{'pi_detail'}->{$peer_key}->{'interval_length'};
my $max_check_attempts = $obj->{'max_check_attempts'};
my $current_attempt = $obj->{'current_attempt'};
my $state = $obj->{'state'};
my $last_check = $obj->{'last_check'}; # Last time the check got an answer
# obj.next_check is refreshed, even when there hasnt been any responses for a while.
# Staleness detection is based on last_check, next_check does not help

my $next_planned_check = 0;
if ($state == 0) {
$next_planned_check = $last_check + $check_interval;
}
elsif ($state != 0 && $current_attempt != $max_check_attempts) {
$next_planned_check = $last_check + $check_interval;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be the retry_interval here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah actually, I confused the logic.

}
else{
$next_planned_check = $last_check + $retry_interval;
}
my $second_next_planned_check = 0;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically did the same logic as next_planned_check, since we cannot be sure about which interval it used on the next step.

whether the next_planned_check made the state OK, not-OK , increased the current_attempt is unknown, since thruk did not hear about it.

if ($state == 0) {
$second_next_planned_check = $next_planned_check + $check_interval;
}
elsif ($state != 0 && $current_attempt != $max_check_attempts) {
$second_next_planned_check = $next_planned_check + $check_interval;
Comment thread
inqrphl marked this conversation as resolved.
Outdated
}
else{
$second_next_planned_check = $next_planned_check + $retry_interval;

# wait at least twice of the normal check interval
if($obj->{'last_check'} > time() - $check_interval * 2) {
return(0);
}

return(0) if $second_next_planned_check > time();

# did any of the parents fail?
my $worst = 0;
for my $parent (values %{$nodes}) {
Expand Down
17 changes: 13 additions & 4 deletions templates/_extinfo_host_service_details.tt
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,19 @@ END
[% IF stale_hint %]
<div class="card alert w-auto red relative shadow-none flexcol gap-1 flex-nowrap justify-center p-2 m-2">
<h3>Stale [% type | html %] detected</h3>
<div>
This [% type | html %] has not been checked recently. Have a look at the
<a href="[% uri_with(c, { type => 'dtree' }) %]" class="link font-bold">dependency tree <i class="uil uil-share-alt align-middle"></i></a>
to get a hint.
<div class="whitespace-pre-line">
<p class="mb-2">
This [% type | html %] has likely missed two scheduled checks in a row according to its last check time: [% date.format(obj.last_check) %]
</p>
<p class="mb-2">
This could be a dependency issue, you can check the <a href="[% uri_with(c, { type => 'dtree' }) %]" class="link font-bold">dependency tree</a> to confirm this.
</p>
<p>
If rescheduling a manual check works and last update is fresh, it is likely that the core is scheduling checks properly but not getting results.
</p>
<p>
If Gearman is used, it could that gearman workers are down or overloaded beyond the point where they can follow the schedule.
</p>
</div>
</div>
[% END %]
Expand Down
Loading