Skip to content

Support multipart downloads when downloading large ranges via TransferManager.download() #248

Open
@forrestfwilliams

Description

@forrestfwilliams

This issue references issues #1215, and its duplicate #3466 from the boto3 repository. It has also been discussed in this stackOverflow post.

Issue

s3transfer supports ranged download requests and multipart downloads, however it is not possible to perform a multi-part download over a specific range. This results in slow download times when attempting to download a 1GB range of data from a 4GB file in S3.

Use Case

I work at the Alaska Satellite Facility, where we distribute large amounts of remote sensing data to users across the globe via AWS. Many of these datasets come in legacy formats, such as zip files, that are not cloud-friendly. Due to the highly structured nature of these datasets, we can identify byte ranges that contain subsets of data that our users would be interested in downloading directly. However, since these datasets are still large (~1GB within a larger 4GB zip file), and multipart downloads are not supported for range requests, we cannot offer extraction of these dataset with low latency. I know of many other groups that have encountered this issue while trying to distribute large remote sensing datasets.

Proposed Solution

It would be great if a range argument were added to TransferConfig, that could then be passed to a TransferManager.download() call, which would then download data ranges with sizes greater than the multipart_threshold via a multipart download.

I am willing to participate in developing this solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions