Skip to content

Queries are routed to unhealthy backends #199

@ahackett

Description

@ahackett

Hi

Thank you for making the presto-gateway available to the community, it's a really useful piece of software.

I've recently been evaluating and testing the gateway, and have found that queries are being routed to backend clusters even when the coordinator is stopped, which of course results in query failures.

My config includes:

    modules:
      - com.lyft.data.gateway.ha.module.HaGatewayProviderModule
      - com.lyft.data.gateway.ha.module.ClusterStateListenerModule

    managedApps:
      - com.lyft.data.gateway.ha.GatewayManagedApp
      - com.lyft.data.gateway.ha.clustermonitor.ActiveClusterMonitor

I can see that #183 "filters out unhealthy clusters from queue based routing logic".

However, since our clusters have authentication enabled I get the following errors in the logs:

WARN  [2023-05-04 08:55:42,052] com.lyft.data.gateway.ha.clustermonitor.ActiveClusterMonitor: Received non 200 response, response code: 401
ERROR [2023-05-04 08:55:42,053] com.lyft.data.gateway.ha.clustermonitor.ActiveClusterMonitor: Received null/empty response for http://trino.example.net:8080/ui/api/stats

So the ActiveClusterMonitor cannnot fetch the metrics which I assume are needed for queue based routing to work.

I was hoping that someone might be able to confirm that the behaviour I'm seeing (queries routed to unhealthy backends) is definitely caused by the failure to fetch the metrics. If confirmed, I could look at adding support for password authentication with the /ui/api/stats endpoint.

Thanks

Austin

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions