-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Current Behavior
| return executeConnectedTask(target, task, failedTimeout); |
| Duration failedTimeout, |
| public static final String CONNECTIONS_FAILED_TIMEOUT = "cryostat.connections.failed-timeout"; |
| cryostat.connections.failed-timeout=30s |
The TargetConnectionManager is responsible for handling all of Cryostat's outgoing connections to discovered targets (JMX and Agent HTTP), including connection caching, failure handling, and timeout handling.
The current timeout logic is too simple and naive - any connection attempt or task that exceeds the timeout duration will be failed. For long-running operations like heap dumps, or pulling particularly large JFR files across a slow network, good connections making progress can be aborted and failed for taking too long. There is an alternate TargetConnectionManager method which allows the caller to specify a custom timeout duration, which is now (since #1133) used for ex. Heap Dumps, but this still requires the author of the calling code to decide what is the maximum amount of time that such an operation is allowed to take, even if it is making continuous progress and does not actually fail.
On the other hand, connections that genuinely fail - due to network problems, or simple misconfiguration (bad connection URL, or JVM target not configured to accept JMX connections) - should be detected more quickly and not wait for a long 30 second (or longer) timeout. This long timeout causes the connection worker threadpool to get choked up if many unconnectable connection attempts are made in succession.
The TargetConnectionManager needs to do a better job of detecting failed connections or dropped connections, vs connections that are still connected/open, and apply the "connection failure timeout" (30 second current default should be significantly reduced) only to the initial connection establishment.
Expected Behavior
No response
Steps To Reproduce
No response
Environment
- OS:
- Environment:
- Version:Anything else?
No response