Skip to content

Conversation

@TingDaoK
Copy link
Contributor

@TingDaoK TingDaoK commented Apr 4, 2025

Issue #, if available:

  • Our current pause doesn't support GetObject
  • The pause is currently just grabbing the status while background threads still running
  • When CRT control the write to the file, it means the file write can happen after the pause is called (race between those two threads.)
  • When downstream wants to gather the file last modified time, it makes the API hard to use. User needs to call the pause, and then wait for the finish callback to gather more data from user's side.
  • Add an async API for pause to notify caller when the write/events completed.

Description of changes:

  • Schedule an event for pause, when the event invokes, the other events should have finished the callbacks.
  • If people invoke the pause and wait for it to complete from one of the callbacks, they may end up with dead lock...
    • warnings??

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-commenter
Copy link

codecov-commenter commented Apr 4, 2025

Codecov Report

Attention: Patch coverage is 20.58824% with 54 lines in your changes missing coverage. Please review.

Project coverage is 88.97%. Comparing base (5eac79f) to head (799b98c).

Files with missing lines Patch % Lines
source/s3_meta_request.c 13.15% 33 Missing ⚠️
source/s3_auto_ranged_get.c 33.33% 16 Missing ⚠️
source/s3_client.c 16.66% 5 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #506      +/-   ##
==========================================
- Coverage   89.70%   88.97%   -0.74%     
==========================================
  Files          20       20              
  Lines        6381     6447      +66     
==========================================
+ Hits         5724     5736      +12     
- Misses        657      711      +54     
Files with missing lines Coverage Δ
source/s3_auto_ranged_put.c 92.36% <ø> (ø)
source/s3_util.c 96.85% <ø> (ø)
source/s3_client.c 90.86% <16.66%> (-0.40%) ⬇️
source/s3_auto_ranged_get.c 93.29% <33.33%> (-3.97%) ⬇️
source/s3_meta_request.c 89.39% <13.15%> (-2.68%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TingDaoK TingDaoK changed the title WIP async pause for get Apr 4, 2025
* - telemetry callback
* - body callback
* So that the status collected in the callback is the final status of the meta request.
* Notes: there is no guarantee that of the service status.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes: there is no guarantee that of the service status.

I am not sure what this line means.

struct aws_s3_meta_request_resume_token *resume_token);

/*
* Upload id associated with operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy paste errors in the docs but ignoring since this is just a draft design review for now.

Comment on lines 1241 to 1243
AWS_S3_API
struct aws_byte_cursor aws_s3_meta_request_resume_object_last_modified(
struct aws_s3_meta_request_resume_token *resume_token);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't an ETAG or versionID be a better option here than last modified time? If someone is trying to resume, they either want the same version or error out. I am not sure if object_last_modified time is supposed to be unique for different versions? What if two uploads complete at the same time?

Another question is that why is the pause token giving out object_modfiied_time from server? We have already delivered this in the on headers callback. This seems like something that doesn't belong in the pause token.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between aws_s3_meta_request_resume_object_last_modified and aws_s3_meta_request_resume_object_last_modified_time?

Comment on lines +48 to +49
.pause = s_s3_auto_ranged_get_pause,
.pause_async = aws_s3_meta_request_pause_async_default,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having both pause and pause_async seems bad. I think complexity is stemming from trying to not break the existing interface. If Java_SDK is the only user of this interface, then it is probably fine to break this interface internally. If we do break it, can we simplify it like pause is always async or have different types of events callbacks (see third paragraph)?

I also think that for now, the workaround on the Java SDK side to just wait for the on_finish callback seems reasonable and might be less complicated if we want to recommend that they should wait for the on_finish callback in all cases, because in the future we might have code that does something between pause callback and finish callback?

Another option is instead of having a separate callback, just make it another type of progress event. If you pause the meta_request, you will get a final event callback to let you know it was paused, but this doesn't fit into our current design where the single purpose of the progress callback is to just let you know how many bytes are completed.

I'm interested to see what other people think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants