Description
This issue references issues #1215, and its duplicate #3466 from the boto3
repository. It has also been discussed in this stackOverflow post.
Issue
s3transfer
supports ranged download requests and multipart downloads, however it is not possible to perform a multi-part download over a specific range. This results in slow download times when attempting to download a 1GB range of data from a 4GB file in S3.
Use Case
I work at the Alaska Satellite Facility, where we distribute large amounts of remote sensing data to users across the globe via AWS. Many of these datasets come in legacy formats, such as zip files, that are not cloud-friendly. Due to the highly structured nature of these datasets, we can identify byte ranges that contain subsets of data that our users would be interested in downloading directly. However, since these datasets are still large (~1GB within a larger 4GB zip file), and multipart downloads are not supported for range requests, we cannot offer extraction of these dataset with low latency. I know of many other groups that have encountered this issue while trying to distribute large remote sensing datasets.
Proposed Solution
It would be great if a range argument were added to TransferConfig
, that could then be passed to a TransferManager.download()
call, which would then download data ranges with sizes greater than the multipart_threshold
via a multipart download.
I am willing to participate in developing this solution.