Skip to content

[CopyObject]: skip HeadObject if user provided ContentLength #618

@grrtrr

Description

@grrtrr

Problem Description

The HeadObject call in aws-c-s3 can/should be skipped when the ContentLength is known (like PutObject).

Use-case (why)

Reason 1: comparison with existing approaches

This follows the design of the boto3 s3Transfer.TransferManager, which also skips its HeadObject call when transfer_future.meta.size is set.

This is also the practice in aws-c-s3 PutObject calls, using the has_content_length, which is true if the ContentLength header has been provided.

Reason 2: work around multi-part copy limitations

Users will very likely have to do their own HeadObject call to avoid the problem that multi-part copy does not honor the MetadataDirective=COPY or the TaggingDirective=COPY, i.e. metadata (e.g. ContentLength, ContentEncoding, Metadata), as well as s3 storage tags (Tagging) are not preserved across multi-part copies.

To avoid this, users need to first do their own HeadObject in order to determine the ContentLength:

  • if this is below s_multipart_copy_minimum_object_size, no further action required;
  • otherwise, copy the metadata, and to an GetObjectTagging call to fetch the tagging data.

In any case, a prepared user knows the ContentLength ahead of the CopyObject call, and might as well provide it.

Reason 3: fewer API calls

If users copy many small files, the extra HeadObject calls soon add up, since even for small blobs, 3 API calls (2 x HeadObject, 1 x CopyObject) are now used; which is quite a force multiplier.

Proposed Solution

Analogous to the existing PutObject implementation, if user provided a ContentLength header, use it and skip the HeadObject request.

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature-requestA feature should be added or improved.needs-triageThis issue or PR still needs to be triaged.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions