-
Notifications
You must be signed in to change notification settings - Fork 72
Description
When I was doing the Dumbo Drop project I hit most of the performance bottlenecks you can find in S3 and Dynamo. One thing I stumbled upon was a much better pattern for storing IPLD blocks in S3.
From the aws documentation.
your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket
This statement isn’t 100% honest. Most of the time you will not truly see this performance against every prefix, but it’s a good window into how S3 is architected and what the performance constraints are.
We’re in a very lucky situation, we can really optimize for this because every block already has a randomized key you can use as a prefix. I’ve recently built two block storage backends for IPLD and in both cases used the CID as a prefix rather than the final key, so something like {cid.toString()}/data and the performance I was able to get was tremendous.
If you really hammer a bucket with writes this way, you’ll see moments in which it’s re-balancing in order to get more throughput. Once I had a few billion blocks in a single bucket I aimed 2000+ lambda functions at the same bucket writing 1MB blocks and Lambda started having issues before I could saturate the bucket which was reliably doing about 40GB/s write throughput.
This library, and any other IPFS/IPLD storage backends for S3, should probably take the same approach.