-
Notifications
You must be signed in to change notification settings - Fork 845
[RFR] S3 bugfixes #2329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFR] S3 bugfixes #2329
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any testing scripts for me to use to reproduce the issues this PR is trying to address? it's okay if it's clunky etc. also it would be good to run the battery of s3 tests for these changes. are you able to set up the s3 tests against your env?
We run the s3 tests on our end for every commit and have been able to reproduce the issue there. I also have a flow that reproduces the issue by reading and writing data to/from S3 using hundreds of workers, but the data isn't public. |
Testing[4287105] @ 00b184d |
Testing for OSS PR [1011] @ commit 00b184d had 12 FAILUREs. |
Failures are from the card tests and the tag mutation test, which have been flaky. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in logic looks correct.
def convert_to_client_error(e): | ||
match = BOTOCORE_MSG_TEMPLATE_MATCH.search(str(e)) | ||
if not match: | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you see a problem here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please re-run black but otherwise LGTM.
This PR fixes several issues which caused s3op to be stuck:
queue.cancel_join_thread()
so that the workers can exit without flushing the queue, otherwise there is a deadlockClientError
)InternalError
SSLError
Additional improvements:
jitter_sleep()