Is your feature request related to a problem? Please describe.
We are sometimes seeing S3 throttling errors on the UI on some of our dashboards. This even happens for queries that are already cached with Athena's query reuse feature. I understand those might be root caused by some sub-optimal partitioning/data layout in our Athena setup (which we are unable to change at the moment). However, I think throttling can naturally happen if you have lots of data to crawl through.
The way I understood S3 is, that throttling happens while S3 is trying to scale up to the amount of concurrent requests it needs to handle. Thus it is signalling to the client to slow down its request rate. This situation is currently not handled gracefully by the Grafana Athena datasource.
Describe the solution you'd like
If my understanding of S3 throttling is correct then there should be some client site retry with backoff mechanism for queries that fail because of S3 throttling. I understand that introducing a rate limit might not be straight-forward as it likely requires tracking some global state on the Grafana Server.
Describe alternatives you've considered
Sadly, I've found no alternatives yet. In a perfect world maybe Athena should already handle this situation more gracefully but we have found no respective configuration options.
Additional context
We have some automation in place that tests all of our dashboard's Athena queries against Grafana's /api/ds/query endpoint. In these tests we faced the same throttling issues and were able to overcome them by adding a retry mechanism and stepwise lowering the rate limit.
Is your feature request related to a problem? Please describe.
We are sometimes seeing S3 throttling errors on the UI on some of our dashboards. This even happens for queries that are already cached with Athena's query reuse feature. I understand those might be root caused by some sub-optimal partitioning/data layout in our Athena setup (which we are unable to change at the moment). However, I think throttling can naturally happen if you have lots of data to crawl through.
The way I understood S3 is, that throttling happens while S3 is trying to scale up to the amount of concurrent requests it needs to handle. Thus it is signalling to the client to slow down its request rate. This situation is currently not handled gracefully by the Grafana Athena datasource.
Describe the solution you'd like
If my understanding of S3 throttling is correct then there should be some client site retry with backoff mechanism for queries that fail because of S3 throttling. I understand that introducing a rate limit might not be straight-forward as it likely requires tracking some global state on the Grafana Server.
Describe alternatives you've considered
Sadly, I've found no alternatives yet. In a perfect world maybe Athena should already handle this situation more gracefully but we have found no respective configuration options.
Additional context
We have some automation in place that tests all of our dashboard's Athena queries against Grafana's
/api/ds/queryendpoint. In these tests we faced the same throttling issues and were able to overcome them by adding a retry mechanism and stepwise lowering the rate limit.