Skip to content

Better reporting when endpoint fails #579

@yadudoc

Description

@yadudoc

Is your feature request related to a problem? Please describe.

This is from a debugging conversation with @ravescovi. When an endpoint fails with a ZMQ error, the endpoint appears to start, a series of log messages announce connection steps which seem to indicate that the endpoint is starting including funcx-endpoint list which indicates that the endpoint started only for it to fail silently later. The delay to failure is a problem, and the fact that the funcx-endpoint list only says disconnected rather than failed is a problem.

Describe the solution you'd like

Ideally the endpoint fails right away, however this might be difficult since the failure happens in the endpoint interchange which is a daemonized process. The next best option would be to have funcx-endpoint list be more descriptive with what failed.

Describe alternatives you've considered

This failure message pops up in the interchange.stderr and isn't reported at the end of the the EndpointInterchange.log. Having this error go the EndpointInterchange.log would have been ideal, one option would be squash the EndpointInterchange.log, interchange.stderr and interchange.stdout all into one interchange.log. Having three places to check is pretty bad.

Additional context
Following the instructions in #393 fixed the ZMQ issue.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions