Skip to content

Abort connections with no valid endpoint #10415

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jschmidt-icinga
Copy link
Contributor

@jschmidt-icinga jschmidt-icinga commented Apr 17, 2025

Problem

In #10405 the problem is that incoming connections with valid certificates, but from endpoints that are not defined locally will get added as anonymous clients (via ApiListener::AddAnonymousClient()) and then hang around essentially forever since the check in JsonRpcConnection::CheckLiveness() only puts anonymous connections on a timeout if they are unauthenticated.

To Reproduce

#10405 describes a more complicated setup in detail, but the simplest setup to reproduce the issue is to have a working, authenticated master/agent or master/satellite setup and then comment out the master endpoint in the zones.conf of the agent/satellite and restart.

Solution

Abort connections early when no endpoint is defined for the incoming connection. This is done by returning early from ApiListener::NewClientHandlerInternal when the certificate is validated, but no endpoint is configured for the remote.

Caveats

Since the client closes the connection very early it is possible that the other side tries to read from or write to the socket, which then fails. For example this message+stracktrace can appear in the log:

[2025-04-17 10:19:12 +0000] notice/JsonRpcConnection: Error while reading JSON-RPC message for identity 'agent-1': Error: End of file

Stacktrace:
 0# __cxa_throw in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
 1# 0x000061869C8B23C6 in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
 2# icinga::JsonRpcConnection::HandleIncomingMessages(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >) in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
 3# 0x000061869CB9D813 in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
 4# make_fcontext in /lib/x86_64-linux-gnu/libboost_context.so.1.74.0

A more complex solution that does not close the connection so abruptly for the remote would involve both sides of the JsonRpcConnection confirming the connection via an exchange of messages and should be considered in a future refactoring of the NewClientHandler code.

For now this closes #10405 by making the cluster checks fail reliably and keeps the parent from blindly sending requests to clients that just silently discard them.

@julianbrost
Copy link
Contributor

Can you please update the PR description with a brief summary of the problem? In particular, #10405 describes a much more complex scenario than what should be necessary to reproduce this.

A more complex solution that does not close the connection so abruptly for the remote (causing various errors in the log)

Also, please share examples of how this behaves now.

@jschmidt-icinga
Copy link
Contributor Author

@julianbrost Please see the updated description. I hope this makes things clearer.

@julianbrost
Copy link
Contributor

Please verify whether ApiUsers authenticated using a TLS client certificate still work, see https://icinga.com/docs/icinga-2/latest/doc/12-icinga2-api/#icinga2-api-authentication

@jschmidt-icinga jschmidt-icinga force-pushed the abort-no-endpoint-conns branch from 4407a89 to f942abc Compare April 22, 2025 07:18
@jschmidt-icinga
Copy link
Contributor Author

@julianbrost You are right, a certificate based ApiUser could no longer connect with this PR. I've just pushed an updated version that should fix this. It returns a bit later and only for verified JSON-RPC connections.

Now both verified connections with no endpoint (i.e. ApiUser with client_cn) and unverified connections with an endpoint (going into AddAnonymousClient) are working.

Are there any other corner cases I'm not thinking about that need further testing?

@jschmidt-icinga jschmidt-icinga force-pushed the abort-no-endpoint-conns branch from f942abc to 1b3a0a8 Compare April 22, 2025 07:30
@jschmidt-icinga jschmidt-icinga force-pushed the abort-no-endpoint-conns branch from 1b3a0a8 to 353386f Compare April 23, 2025 14:55
@jschmidt-icinga jschmidt-icinga requested a review from Al2Klimov May 5, 2025 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connection from satellite to agent with misconfigured zone does not fail
3 participants