Skip to content

Investigate and tweak S3 timeout/retry behavior #160

@prashtx

Description

@prashtx

Requests to S3 will sometimes fail with ECONNRESET or EHOSTUNREACH. We should understand how long it takes us to fail with those errors. If we wait on S3 for 25 seconds, for example, then we're close to having Heroku cut off the connection, and we likely could have generated the tile from scratch in that time. It might be better to set shorter timeouts on the S3 requests and possibly retry once or twice.

Separating the cache metadata from the cached data could also help. We could store the smaller metadata in a fast and reliable database (possibly redis, for example) and the larger tile data in S3. The miss latency should decrease substantially. We could also store metadata regarding the tile generation time, so we would know how long is too long to wait for S3. The hit latency would likely go up slightly, but we might be able to mitigate that by issuing a request for the data before we know if the data exists. There's also a chance the data was not properly saved to S3, so the metadata and data could be slightly out of sync. That creates another, but hopefully rare, component of the miss latency (metadata exists and indicates that the data should be fresh, but the data is not in S3).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions