Skip to content

Connection from satellite to agent with misconfigured zone does not fail #10405

Open
@jschmidt-icinga

Description

@jschmidt-icinga

Describe the bug

In the following situation the cluster and cluster-zone checks fails to detect a misconfigured zone on the agent.

onboarding-5 is a satellite that wants to connect to an agent onboarding-2 that has a misconfigured zone.

onboarding-5 tries to connect to onboarding-2 which rejects the attempt because it doesn't know onboarding-5 as an endpoint.

[2025-04-08 10:08:49 +0000] information/ApiListener: New client connection for identity 'onboarding-5' from [::ffff:10.27.2.199]:60250 (no Endpoint object found for identity)

but onboarding-5 does not seem to realize this and continues to send commands that onboarding-2 rejects due to an invalid endpoint origin:

[2025-04-08 10:08:49 +0000] notice/JsonRpcConnection: Received 'event::ExecuteCommand' message from identity 'onboarding-5'.
[2025-04-08 10:08:49 +0000] notice/ClusterEvents: Discarding 'execute command' message from 'onboarding-5': Invalid endpoint origin (client not allowed).

This leads to the checks run on onboarding-2 to come up as "Overdue":
Image

To Reproduce

onboarding-5's zone configuration:

object Endpoint "onboarding-1" {
}

object Endpoint "onboarding-4" {
}

object Zone "master" {
      endpoints = [ "onboarding-1", "onboarding-4" ]
}

object Endpoint "onboarding-5" {
}

object Zone "onboarding-5" {
      endpoints = [ "onboarding-5" ]
      parent = "master"
}

object Zone "global-templates" {
      global = true
}

object Zone "director-global" {
      global = true
}

object Endpoint "onboarding-2" {
    host = "10.27.1.225"
    log_duration = 0s
}

object Zone "onboarding-2" {
    parent = "onboarding-5"
    endpoints = [ "onboarding-2" ]
}

And on the agent onboarding-2 the following (misconfigured) setup is left over from when the agent was connected directly to the masters:

object Endpoint "onboarding-1" {
}

object Endpoint "onboarding-4" {
}

object Zone "master" {
      endpoints = [ "onboarding-1", "onboarding-4" ]
}

object Endpoint "onboarding-2" {
}

object Zone "onboarding-2" {
      endpoints = [ "onboarding-2" ]
      parent = "master"
}

object Zone "global-templates" {
      global = true
}

object Zone "director-global" {
      global = true
}

The following services then return ok even though the connection fails and no tests are returned from onboarding-5 (see attached log):

template Service "Generic Service" {
    max_check_attempts = "5"
    check_interval = 1m
    retry_interval = 30s
}

template Service "Icinga Service" {
    import "Generic Service"

    command_endpoint = host_name
}

object Service "cluster-zone-onboarding-2" {
    host_name = "onboarding-5"
    import "Icinga Service"

    check_command = "cluster-zone"
    vars.cluster_zone = "onboarding-2"
}

Expected behavior

I would expect the connection to fail, the cluster check on onboarding-5 and the above cluster-zone check to come up as CRITICAL and the service checks not being "Overdue" but in state UNKNOWN.

Your Environment

  • Version used (icinga2 --version): r2.14.5-1
  • Operating System and version: Debian 12
  • Enabled features (icinga2 feature list): api checker mainlog
  • Icinga Web 2 version and modules (System - About): 2.12.4

Additional context

Debug.log from the agent onboarding-2:
debug.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/distributedDistributed monitoring (master, satellites, clients)bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions