Skip to content

feat: optimize keys for S3 performance #105

@mikeal

Description

@mikeal

When I was doing the Dumbo Drop project I hit most of the performance bottlenecks you can find in S3 and Dynamo. One thing I stumbled upon was a much better pattern for storing IPLD blocks in S3.

From the aws documentation.

your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket

This statement isn’t 100% honest. Most of the time you will not truly see this performance against every prefix, but it’s a good window into how S3 is architected and what the performance constraints are.

We’re in a very lucky situation, we can really optimize for this because every block already has a randomized key you can use as a prefix. I’ve recently built two block storage backends for IPLD and in both cases used the CID as a prefix rather than the final key, so something like {cid.toString()}/data and the performance I was able to get was tremendous.

If you really hammer a bucket with writes this way, you’ll see moments in which it’s re-balancing in order to get more throughput. Once I had a few billion blocks in a single bucket I aimed 2000+ lambda functions at the same bucket writing 1MB blocks and Lambda started having issues before I could saturate the bucket which was reliably doing about 40GB/s write throughput.

This library, and any other IPFS/IPLD storage backends for S3, should probably take the same approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions