Hi
Thank you for making the presto-gateway available to the community, it's a really useful piece of software.
I've recently been evaluating and testing the gateway, and have found that queries are being routed to backend clusters even when the coordinator is stopped, which of course results in query failures.
My config includes:
modules:
- com.lyft.data.gateway.ha.module.HaGatewayProviderModule
- com.lyft.data.gateway.ha.module.ClusterStateListenerModule
managedApps:
- com.lyft.data.gateway.ha.GatewayManagedApp
- com.lyft.data.gateway.ha.clustermonitor.ActiveClusterMonitor
I can see that #183 "filters out unhealthy clusters from queue based routing logic".
However, since our clusters have authentication enabled I get the following errors in the logs:
WARN [2023-05-04 08:55:42,052] com.lyft.data.gateway.ha.clustermonitor.ActiveClusterMonitor: Received non 200 response, response code: 401
ERROR [2023-05-04 08:55:42,053] com.lyft.data.gateway.ha.clustermonitor.ActiveClusterMonitor: Received null/empty response for http://trino.example.net:8080/ui/api/stats
So the ActiveClusterMonitor cannnot fetch the metrics which I assume are needed for queue based routing to work.
I was hoping that someone might be able to confirm that the behaviour I'm seeing (queries routed to unhealthy backends) is definitely caused by the failure to fetch the metrics. If confirmed, I could look at adding support for password authentication with the /ui/api/stats endpoint.
Thanks
Austin
Hi
Thank you for making the presto-gateway available to the community, it's a really useful piece of software.
I've recently been evaluating and testing the gateway, and have found that queries are being routed to backend clusters even when the coordinator is stopped, which of course results in query failures.
My config includes:
I can see that #183 "filters out unhealthy clusters from queue based routing logic".
However, since our clusters have authentication enabled I get the following errors in the logs:
So the ActiveClusterMonitor cannnot fetch the metrics which I assume are needed for queue based routing to work.
I was hoping that someone might be able to confirm that the behaviour I'm seeing (queries routed to unhealthy backends) is definitely caused by the failure to fetch the metrics. If confirmed, I could look at adding support for password authentication with the /ui/api/stats endpoint.
Thanks
Austin