Skip to content

quasi-coherent/aws-multipart-upload

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

aws-multipart-upload


Description

A high-level API for building and working with AWS S3 multipart uploads using the official SDK for Rust.

Motivation

Making an AWS S3 multipart upload is a fairly involved multi-stage process:

  1. Send a request to create the multipart upload; preserve the ID of the upload from the response.
  2. Build parts for the upload, which generally follow the pattern:
    • Repeatedly write to some buffer of bytes, keeping track of how many bytes have been written. AWS imposes a minimum and maximum size for a part.
    • It is critical also to keep track of the part number. When a part should be sent, collect the bytes along with part number, upload ID, and object URI into the request object. A successful response contains the entity tag of the part. This must be stored with the exact part number that was used.
  3. Repeat until the upload should be completed, which typically involves tracking another counter for bytes written. The request to complete the upload needs the ID, object URI, and the complete collection of part number paired with entity tag. Send the request. A 200 response has the entity tag of the object (along with other metadata).

The official AWS Rust SDK is generated code that exposes request builders that can be initialized and sent from a client, including the several mentioned above, but there isn't much beyond that.

The aws-multipart-upload crate aims to simplify this process and do so with abstractions that integrate cleanly with the parts of the Rust ecosystem one is likely to be using, or that one would like to be using, when performing multipart uploads.

Example

Add the crate to your Cargo.toml:

aws-multipart-upload = "0.1.0-rc5"

The feature flag "csv" enables a "part encoder"--the component responsible for writing items to a part--built from a csv writer. Part encoders for writing jsonlines and for writing arbitrary lines of text are available as well.

This example shows a stream of serde_json::Values being written as comma-separated values to a multipart upload. This is a future and awaiting the future runs the stream to completion by writing and uploading parts behind the scenes, completing the upload when the stream is exhausted.

See more examples here.

use aws_multipart_upload::{ByteSize, SdkClient, UploadBuilder};
use aws_multipart_upload::codec::CsvEncoder;
use aws_multipart_upload::write::UploadStreamExt as _;
use futures::stream::{self, StreamExt as _};
use serde_json::{Value, json};

/// Default aws-sdk-s3 client:
let client = SdkClient::defaults().await;

/// Use `UploadBuilder` to build a multipart uploader:
let upl = UploadBuilder::new(client)
    .with_encoder(CsvEncoder::default().with_header())
    .with_part_size(ByteSize::mib(10))
    .with_uri(("example-bucket-us-east-1", "destination/key.csv"))
    .build();

/// Consume a stream of `Value`s by forwarding it to `upl`,
/// and poll for completion:
let values = stream::iter(0..).map(|n| json!({"n": n, "n_sq": n * n}));
let completed = values
    .take(100000)
    .collect_upload(upl)
    .await
    .unwrap();

println!("object uploaded: {}", completed.uri);

About

SDK plugin for S3 multipart uploads

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages