-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
When the config flag "native-sidecar=true" is added to worker machines, it leads to a bug where coordinator will continue trying to send queries to it, even if it is no longer part of the machine cluster. The issue is not very easy to reproduce, but it could happen when the worker node is moevd away from cluster and no longer the part of cluster. The coordinator should ideally ignore the old worker because its not in the worker tier, but because the worker also announces that it is a coordinatorSidecar, the nodeStatusService check check is bypassed and the coordinator does not ignore that worker.
The cause appears to be coming from the logic to filter the relevant nodes in DiscoveryNodeManager.
This logic makes it so that the worker nodes with sidecar enabled will announce to the coordinator that it is available and part of the cluster. In the case that the worker node is no longer part of the cluster, it can lead to query failures.
Because of this issue, the logic in DiscoveryNodeManager should be revisited or tweaked so that this does not happen.
Your Environment
- Presto version used:
- Storage (HDFS/S3/GCS..):
- Data source and connector used:
- Deployment (Cloud or On-prem):
- Pastebin link to the complete debug logs:
Expected Behavior
Current Behavior
Possible Solution
Steps to Reproduce
Screenshots (if appropriate)
Context
Metadata
Metadata
Assignees
Labels
Type
Projects
Status