feat: optimize keys for S3 performance

When I was doing the Dumbo Drop project I hit most of the performance bottlenecks you can find in S3 and Dynamo. One thing I stumbled upon was a much better pattern for storing IPLD blocks in S3.

From the [aws documentation](https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html).

> your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per **prefix** in a bucket

This statement isn’t 100% honest. Most of the time you will not truly see this performance against **every** prefix, but it’s a good window into how S3 is architected and what the performance constraints are.

We’re in a very lucky situation, we can really optimize for this because every block already has a randomized key you can use as a prefix. I’ve recently built two block storage backends for IPLD and in both cases used the CID as a prefix rather than the final key, so something like `{cid.toString()}/data` and the performance I was able to get was tremendous.

If you really hammer a bucket with writes this way, you’ll see moments in which it’s re-balancing in order to get more throughput. Once I had a few billion blocks in a single bucket I aimed 2000+ lambda functions at the same bucket writing 1MB blocks and *Lambda started having issues* before I could saturate the bucket which was reliably doing about 40GB/s write throughput.

This library, and any other IPFS/IPLD storage backends for S3, should probably take the same approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: optimize keys for S3 performance #105

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: optimize keys for S3 performance #105

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions