Investigate and tweak S3 timeout/retry behavior

Requests to S3 will sometimes fail with `ECONNRESET` or `EHOSTUNREACH`. We should understand how long it takes us to fail with those errors. If we wait on S3 for 25 seconds, for example, then we're close to having Heroku cut off the connection, and we likely could have generated the tile from scratch in that time. It might be better to set shorter timeouts on the S3 requests and possibly retry once or twice.

Separating the cache metadata from the cached data could also help. We could store the smaller metadata in a fast and reliable database (possibly redis, for example) and the larger tile data in S3. The miss latency should decrease substantially. We could also store metadata regarding the tile generation time, so we would know how long is too long to wait for S3. The hit latency would likely go up slightly, but we might be able to mitigate that by issuing a request for the data before we know if the data exists. There's also a chance the data was not properly saved to S3, so the metadata and data could be slightly out of sync. That creates another, but hopefully rare, component of the miss latency (metadata exists and indicates that the data should be fresh, but the data is not in S3).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate and tweak S3 timeout/retry behavior #160

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigate and tweak S3 timeout/retry behavior #160

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions