Skip to content

[Bug] Target connection failure timeout vs long operation #1130

@andrewazores

Description

@andrewazores

Current Behavior

return executeConnectedTask(target, task, failedTimeout);

public static final String CONNECTIONS_FAILED_TIMEOUT = "cryostat.connections.failed-timeout";

cryostat.connections.failed-timeout=30s

The TargetConnectionManager is responsible for handling all of Cryostat's outgoing connections to discovered targets (JMX and Agent HTTP), including connection caching, failure handling, and timeout handling.

The current timeout logic is too simple and naive - any connection attempt or task that exceeds the timeout duration will be failed. For long-running operations like heap dumps, or pulling particularly large JFR files across a slow network, good connections making progress can be aborted and failed for taking too long. There is an alternate TargetConnectionManager method which allows the caller to specify a custom timeout duration, which is now (since #1133) used for ex. Heap Dumps, but this still requires the author of the calling code to decide what is the maximum amount of time that such an operation is allowed to take, even if it is making continuous progress and does not actually fail.

On the other hand, connections that genuinely fail - due to network problems, or simple misconfiguration (bad connection URL, or JVM target not configured to accept JMX connections) - should be detected more quickly and not wait for a long 30 second (or longer) timeout. This long timeout causes the connection worker threadpool to get choked up if many unconnectable connection attempts are made in succession.

The TargetConnectionManager needs to do a better job of detecting failed connections or dropped connections, vs connections that are still connected/open, and apply the "connection failure timeout" (30 second current default should be significantly reduced) only to the initial connection establishment.

Expected Behavior

No response

Steps To Reproduce

No response

Environment

- OS:
- Environment:
- Version:

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions