Skip to content

Revamp checksum - retry will reuse the checksum#532

Merged
TingDaoK merged 20 commits into
mainfrom
revamp-checksum
Jul 21, 2025
Merged

Revamp checksum - retry will reuse the checksum#532
TingDaoK merged 20 commits into
mainfrom
revamp-checksum

Conversation

@TingDaoK
Copy link
Copy Markdown
Contributor

@TingDaoK TingDaoK commented Jul 2, 2025

Issue #, if available:

  • We previously clean up the checksum after prepare.
  • It results in the checksum to be recalculated during retry
  • It's violating the durability guarantee
  • However, since we already buffer the input in memory during retry, it won't read the customer input again, which mitigated the impact.

Description of changes:

  • Revamp the code to allow the passed in checksum to the input stream wrappers. So that it won't recalculate the checksum

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jul 2, 2025

Codecov Report

❌ Patch coverage is 92.46575% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.75%. Comparing base (081eee8) to head (8304215).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
source/s3_auto_ranged_put.c 86.95% 3 Missing ⚠️
source/s3_checksum_context.c 95.08% 3 Missing ⚠️
source/s3_request_messages.c 88.46% 3 Missing ⚠️
source/s3_chunk_stream.c 92.30% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #532      +/-   ##
==========================================
+ Coverage   88.66%   88.75%   +0.08%     
==========================================
  Files          21       22       +1     
  Lines        6652     6722      +70     
==========================================
+ Hits         5898     5966      +68     
- Misses        754      756       +2     
Files with missing lines Coverage Δ
source/s3_checksum_stream.c 75.00% <100.00%> (+0.92%) ⬆️
source/s3_checksums.c 86.06% <100.00%> (ø)
source/s3_default_meta_request.c 95.37% <100.00%> (+0.05%) ⬆️
source/s3_meta_request.c 91.29% <100.00%> (ø)
source/s3_chunk_stream.c 80.82% <92.30%> (+3.95%) ⬆️
source/s3_auto_ranged_put.c 92.07% <86.95%> (-0.29%) ⬇️
source/s3_checksum_context.c 95.08% <95.08%> (ø)
source/s3_request_messages.c 73.54% <88.46%> (-0.82%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TingDaoK TingDaoK changed the title Revamp checksum Revamp checksum - retry will reuse the checksum Jul 2, 2025
@TingDaoK TingDaoK marked this pull request as ready for review July 2, 2025 22:45
#include <aws/common/byte_buf.h>
#include <aws/common/ref_count.h>

struct aws_s3_meta_request_checksum_config_storage;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it need to be in a separate header?
with all the cross header forward declarations it seems the 2 headers are tightly coupled

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't need to be separate, but I'd think it be more clear to have them separate.

Comment thread include/aws/s3/private/s3_checksum_context.h Outdated
Comment thread include/aws/s3/private/s3_checksum_context.h Outdated
Comment thread include/aws/s3/private/s3_checksum_context.h Outdated
Comment thread include/aws/s3/private/s3_checksums.h Outdated
Comment thread source/s3_auto_ranged_put.c Outdated
struct aws_byte_buf *buffer,
uint32_t part_number,
const struct aws_string *upload_id,
bool should_compute_content_md5,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for now, but we should really unify md5 with the rest of checksum at some point

@DmitriyMusatkin
Copy link
Copy Markdown
Contributor

should we sanity check that the size of data pushed through the checksum matches expected payload size? we can even check response size here?

@TingDaoK
Copy link
Copy Markdown
Contributor Author

should we sanity check that the size of data pushed through the checksum matches expected payload size? we can even check response size here?

In the case of the checksum was calculated for some portion of the part, and client somehow reused the checksum, the server side will be able to catch the error since the checksum won't match the full part of data client sent. So, it's less a concern here.

@TingDaoK TingDaoK merged commit 36e2c37 into main Jul 21, 2025
38 checks passed
@TingDaoK TingDaoK deleted the revamp-checksum branch July 21, 2025 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants