-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Is your feature request related to a problem? Please describe.
This is from a debugging conversation with @ravescovi. When an endpoint fails with a ZMQ error, the endpoint appears to start, a series of log messages announce connection steps which seem to indicate that the endpoint is starting including funcx-endpoint list which indicates that the endpoint started only for it to fail silently later. The delay to failure is a problem, and the fact that the funcx-endpoint list only says disconnected rather than failed is a problem.
Describe the solution you'd like
Ideally the endpoint fails right away, however this might be difficult since the failure happens in the endpoint interchange which is a daemonized process. The next best option would be to have funcx-endpoint list be more descriptive with what failed.
Describe alternatives you've considered
This failure message pops up in the interchange.stderr and isn't reported at the end of the the EndpointInterchange.log. Having this error go the EndpointInterchange.log would have been ideal, one option would be squash the EndpointInterchange.log, interchange.stderr and interchange.stdout all into one interchange.log. Having three places to check is pretty bad.
Additional context
Following the instructions in #393 fixed the ZMQ issue.