Skip to content

Retry / rate-limit queries that failed due to S3 throttling #320

@skuzzle

Description

@skuzzle

Is your feature request related to a problem? Please describe.
We are sometimes seeing S3 throttling errors on the UI on some of our dashboards. This even happens for queries that are already cached with Athena's query reuse feature. I understand those might be root caused by some sub-optimal partitioning/data layout in our Athena setup (which we are unable to change at the moment). However, I think throttling can naturally happen if you have lots of data to crawl through.
The way I understood S3 is, that throttling happens while S3 is trying to scale up to the amount of concurrent requests it needs to handle. Thus it is signalling to the client to slow down its request rate. This situation is currently not handled gracefully by the Grafana Athena datasource.

Describe the solution you'd like
If my understanding of S3 throttling is correct then there should be some client site retry with backoff mechanism for queries that fail because of S3 throttling. I understand that introducing a rate limit might not be straight-forward as it likely requires tracking some global state on the Grafana Server.

Describe alternatives you've considered
Sadly, I've found no alternatives yet. In a perfect world maybe Athena should already handle this situation more gracefully but we have found no respective configuration options.

Additional context
We have some automation in place that tests all of our dashboard's Athena queries against Grafana's /api/ds/query endpoint. In these tests we faced the same throttling issues and were able to overcome them by adding a retry mechanism and stepwise lowering the rate limit.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions