Description
Describe the bug
In the following situation the cluster and cluster-zone checks fails to detect a misconfigured zone on the agent.
onboarding-5
is a satellite that wants to connect to an agent onboarding-2
that has a misconfigured zone.
onboarding-5
tries to connect to onboarding-2
which rejects the attempt because it doesn't know onboarding-5
as an endpoint.
[2025-04-08 10:08:49 +0000] information/ApiListener: New client connection for identity 'onboarding-5' from [::ffff:10.27.2.199]:60250 (no Endpoint object found for identity)
but onboarding-5
does not seem to realize this and continues to send commands that onboarding-2
rejects due to an invalid endpoint origin:
[2025-04-08 10:08:49 +0000] notice/JsonRpcConnection: Received 'event::ExecuteCommand' message from identity 'onboarding-5'.
[2025-04-08 10:08:49 +0000] notice/ClusterEvents: Discarding 'execute command' message from 'onboarding-5': Invalid endpoint origin (client not allowed).
This leads to the checks run on onboarding-2
to come up as "Overdue":
To Reproduce
onboarding-5
's zone configuration:
object Endpoint "onboarding-1" {
}
object Endpoint "onboarding-4" {
}
object Zone "master" {
endpoints = [ "onboarding-1", "onboarding-4" ]
}
object Endpoint "onboarding-5" {
}
object Zone "onboarding-5" {
endpoints = [ "onboarding-5" ]
parent = "master"
}
object Zone "global-templates" {
global = true
}
object Zone "director-global" {
global = true
}
object Endpoint "onboarding-2" {
host = "10.27.1.225"
log_duration = 0s
}
object Zone "onboarding-2" {
parent = "onboarding-5"
endpoints = [ "onboarding-2" ]
}
And on the agent onboarding-2
the following (misconfigured) setup is left over from when the agent was connected directly to the masters:
object Endpoint "onboarding-1" {
}
object Endpoint "onboarding-4" {
}
object Zone "master" {
endpoints = [ "onboarding-1", "onboarding-4" ]
}
object Endpoint "onboarding-2" {
}
object Zone "onboarding-2" {
endpoints = [ "onboarding-2" ]
parent = "master"
}
object Zone "global-templates" {
global = true
}
object Zone "director-global" {
global = true
}
The following services then return ok even though the connection fails and no tests are returned from onboarding-5 (see attached log):
template Service "Generic Service" {
max_check_attempts = "5"
check_interval = 1m
retry_interval = 30s
}
template Service "Icinga Service" {
import "Generic Service"
command_endpoint = host_name
}
object Service "cluster-zone-onboarding-2" {
host_name = "onboarding-5"
import "Icinga Service"
check_command = "cluster-zone"
vars.cluster_zone = "onboarding-2"
}
Expected behavior
I would expect the connection to fail, the cluster
check on onboarding-5 and the above cluster-zone
check to come up as CRITICAL and the service checks not being "Overdue" but in state UNKNOWN.
Your Environment
- Version used (
icinga2 --version
): r2.14.5-1 - Operating System and version: Debian 12
- Enabled features (
icinga2 feature list
):api checker mainlog
- Icinga Web 2 version and modules (System - About): 2.12.4
Additional context
Debug.log from the agent onboarding-2
:
debug.log