Skip to content

Introduce Checkable#scheduler_shuffle_cap #7718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/09-object-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -353,6 +353,7 @@ Configuration Attributes:
check\_timeout | Duration | **Optional.** Check command timeout in seconds. Overrides the CheckCommand's `timeout` attribute.
check\_interval | Duration | **Optional.** The check interval (in seconds). This interval is used for checks when the host is in a `HARD` state. Defaults to `5m`.
retry\_interval | Duration | **Optional.** The retry interval (in seconds). This interval is used for checks when the host is in a `SOFT` state. Defaults to `1m`. Note: This does not affect the scheduling [after a passive check result](08-advanced-topics.md#check-result-freshness).
scheduler\_shuffle\_cap | Number | **Optional.** Number of percent by up to which Icinga is allowed to override the check interval arbitrarily and in any direction to reduce load spikes. Defaults to 0.
enable\_notifications | Boolean | **Optional.** Whether notifications are enabled. Defaults to true.
enable\_active\_checks | Boolean | **Optional.** Whether active checks are enabled. Defaults to true.
enable\_passive\_checks | Boolean | **Optional.** Whether passive checks are enabled. Defaults to true.
Expand Down Expand Up @@ -719,6 +720,7 @@ Configuration Attributes:
check\_timeout | Duration | **Optional.** Check command timeout in seconds. Overrides the CheckCommand's `timeout` attribute.
check\_interval | Duration | **Optional.** The check interval (in seconds). This interval is used for checks when the service is in a `HARD` state. Defaults to `5m`.
retry\_interval | Duration | **Optional.** The retry interval (in seconds). This interval is used for checks when the service is in a `SOFT` state. Defaults to `1m`. Note: This does not affect the scheduling [after a passive check result](08-advanced-topics.md#check-result-freshness).
scheduler\_shuffle\_cap | Number | **Optional.** Number of percent by up to which Icinga is allowed to override the check interval arbitrarily and in any direction to reduce load spikes. Defaults to 0.
enable\_notifications | Boolean | **Optional.** Whether notifications are enabled. Defaults to `true`.
enable\_active\_checks | Boolean | **Optional.** Whether active checks are enabled. Defaults to `true`.
enable\_passive\_checks | Boolean | **Optional.** Whether passive checks are enabled. Defaults to `true`.
Expand Down
24 changes: 21 additions & 3 deletions lib/icinga/checkable-check.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
#include "base/convert.hpp"
#include "base/utility.hpp"
#include "base/context.hpp"
#include <cstdlib>

using namespace icinga;

Expand Down Expand Up @@ -67,7 +68,7 @@ void Checkable::UpdateNextCheck(const MessageOrigin::Ptr& origin)
if (adj != 0.0)
adj = std::min(0.5 + fmod(GetSchedulingOffset(), interval * 5) / 100.0, adj);

double nextCheck = now - adj + interval;
double nextCheck = now - adj + interval * GetIntervalShuffleFactor();
double lastCheck = GetLastCheck();

Log(LogDebug, "Checkable")
Expand Down Expand Up @@ -384,7 +385,7 @@ Checkable::ProcessingResult Checkable::ProcessCheckResult(const CheckResult::Ptr
if (ttl > 0)
offset = ttl;
else
offset = GetCheckInterval();
offset = GetCheckInterval() * GetIntervalShuffleFactor();

SetNextCheck(Utility::GetTime() + offset);
}
Expand Down Expand Up @@ -425,7 +426,7 @@ Checkable::ProcessingResult Checkable::ProcessCheckResult(const CheckResult::Ptr
if (!parent->GetEnableActiveChecks())
continue;

if (parent->GetNextCheck() >= now + parent->GetRetryInterval()) {
if (parent->GetNextCheck() >= now + parent->GetRetryInterval() * parent->GetIntervalShuffleFactor()) {
ObjectLock olock(parent);
parent->SetNextCheck(now);
}
Expand Down Expand Up @@ -720,3 +721,20 @@ void Checkable::AquirePendingCheckSlot(int maxPendingChecks)

m_PendingChecks++;
}

/**
* Returns a random factor derived from scheduler_shuffle_cap to multiply the check interval with.
*
* E.g. if scheduler_shuffle_cap is 20 (%), this function returns [0.8, 1.2].
*/
double Checkable::GetIntervalShuffleFactor()
{
if (!GetEnableActiveChecks()) {
// scheduler_shuffle_cap doesn't influence external checkers.
return 1;
}

return (GetSchedulerShuffleCap() / 100) // scheduler_shuffle_cap as non-%, i.e. 10 => 0.1
* (rand() / (double)RAND_MAX * 2 - 1) // random number [-1, 1]
+ 1;
}
5 changes: 3 additions & 2 deletions lib/icinga/checkable.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#include "base/exception.hpp"
#include "base/timer.hpp"
#include <boost/thread/once.hpp>
#include <cmath>

using namespace icinga;

Expand Down Expand Up @@ -93,9 +94,9 @@ void Checkable::Start(bool runtimeCreated)
}

if (GetNextCheck() < now + 60) {
double delta = std::min(GetCheckInterval(), 60.0);
double delta = std::min(GetCheckInterval() * GetIntervalShuffleFactor(), 60.0);
delta *= (double)std::rand() / RAND_MAX;
SetNextCheck(now + delta);
SetNextCheck(now + delta + GetCheckInterval() * fabs(GetIntervalShuffleFactor() - 1));
}

ObjectImpl<Checkable>::Start(runtimeCreated);
Expand Down
1 change: 1 addition & 0 deletions lib/icinga/checkable.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,7 @@ class Checkable : public ObjectImpl<Checkable>
bool NotificationReasonApplies(NotificationType type);
bool NotificationReasonSuppressed(NotificationType type);
bool IsLikelyToBeCheckedSoon();
double GetIntervalShuffleFactor();

void FireSuppressedNotifications();

Expand Down
3 changes: 3 additions & 0 deletions lib/icinga/checkable.ti
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@ abstract class Checkable : CustomVarObject
[config] double retry_interval {
default {{{ return 60; }}}
};
[config] double scheduler_shuffle_cap {
default {{{ return 0.0; }}}
};
[config, navigation] name(EventCommand) event_command (EventCommandRaw) {
navigate {{{
return EventCommand::GetByName(GetEventCommandRaw());
Expand Down
Loading