-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Problem Description
The HeadObject call in aws-c-s3 can/should be skipped when the ContentLength is known (like PutObject).
Use-case (why)
Reason 1: comparison with existing approaches
This follows the design of the boto3 s3Transfer.TransferManager, which also skips its HeadObject call when transfer_future.meta.size is set.
This is also the practice in aws-c-s3 PutObject calls, using the has_content_length, which is true if the ContentLength header has been provided.
Reason 2: work around multi-part copy limitations
Users will very likely have to do their own HeadObject call to avoid the problem that multi-part copy does not honor the MetadataDirective=COPY or the TaggingDirective=COPY, i.e. metadata (e.g. ContentLength, ContentEncoding, Metadata), as well as s3 storage tags (Tagging) are not preserved across multi-part copies.
To avoid this, users need to first do their own HeadObject in order to determine the ContentLength:
- if this is below
s_multipart_copy_minimum_object_size, no further action required; - otherwise, copy the metadata, and to an
GetObjectTaggingcall to fetch the tagging data.
In any case, a prepared user knows the ContentLength ahead of the CopyObject call, and might as well provide it.
Reason 3: fewer API calls
If users copy many small files, the extra HeadObject calls soon add up, since even for small blobs, 3 API calls (2 x HeadObject, 1 x CopyObject) are now used; which is quite a force multiplier.
Proposed Solution
Analogous to the existing PutObject implementation, if user provided a ContentLength header, use it and skip the HeadObject request.
Acknowledgements
- I may be able to implement this feature request
- This feature might incur a breaking change