A high-level API for building and working with AWS S3 multipart uploads using the official SDK for Rust.
Making an AWS S3 multipart upload is a fairly involved multi-stage process:
- Send a request to create the multipart upload; preserve the ID of the upload from the response.
- Build parts for the upload, which generally follow the pattern:
- Repeatedly write to some buffer of bytes, keeping track of how many bytes have been written. AWS imposes a minimum and maximum size for a part.
- It is critical also to keep track of the part number. When a part should be sent, collect the bytes along with part number, upload ID, and object URI into the request object. A successful response contains the entity tag of the part. This must be stored with the exact part number that was used.
- Repeat until the upload should be completed, which typically involves tracking another counter for bytes written. The request to complete the upload needs the ID, object URI, and the complete collection of part number paired with entity tag. Send the request. A 200 response has the entity tag of the object (along with other metadata).
The official AWS Rust SDK is generated code that exposes request builders that can be initialized and sent from a client, including the several mentioned above, but there isn't much beyond that.
The aws-multipart-upload crate aims to simplify this process and do so with abstractions that
integrate cleanly with the parts of the Rust ecosystem one is likely to be using, or that one would
like to be using, when performing multipart uploads.
Add the crate to your Cargo.toml:
aws-multipart-upload = "0.1.0-rc5"The feature flag "csv" enables a "part encoder"--the component responsible for writing items to a
part--built from a csv writer. Part encoders for writing jsonlines and for writing
arbitrary lines of text are available as well.
This example shows a stream of serde_json::Values being written as comma-separated values to a
multipart upload. This is a future and awaiting the future runs the stream to completion by writing
and uploading parts behind the scenes, completing the upload when the stream is exhausted.
See more examples here.
use aws_multipart_upload::{ByteSize, SdkClient, UploadBuilder};
use aws_multipart_upload::codec::CsvEncoder;
use aws_multipart_upload::write::UploadStreamExt as _;
use futures::stream::{self, StreamExt as _};
use serde_json::{Value, json};
/// Default aws-sdk-s3 client:
let client = SdkClient::defaults().await;
/// Use `UploadBuilder` to build a multipart uploader:
let upl = UploadBuilder::new(client)
.with_encoder(CsvEncoder::default().with_header())
.with_part_size(ByteSize::mib(10))
.with_uri(("example-bucket-us-east-1", "destination/key.csv"))
.build();
/// Consume a stream of `Value`s by forwarding it to `upl`,
/// and poll for completion:
let values = stream::iter(0..).map(|n| json!({"n": n, "n_sq": n * n}));
let completed = values
.take(100000)
.collect_upload(upl)
.await
.unwrap();
println!("object uploaded: {}", completed.uri);