-
Notifications
You must be signed in to change notification settings - Fork 127
[Python] Fix flaky test test_tls_with_self_signed_certificate_succeeds #4946
#5054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e575f32 to
8f22339
Compare
…S certificate test Signed-off-by: hank95179 <[email protected]>
8f22339 to
226131b
Compare
|
In your description under I'm unsure if that is what was intended but an exponential backoff is usually the norm for retry. Ideally, its a not that high of an exponential, because if there were more retires it would take a very long time. But three retries is fine here with base 2 exponential. Please update your description, or adjust your implementation. Otherwise, it generally looks good. |
|
@xShinnRyuu Thank you! I've fixed the description. |
xShinnRyuu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
currantw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with one suggestion. Thanks for your help!
| for i in range(3): | ||
| try: | ||
| client = await GlideClusterClient.create(cluster_config) | ||
| break | ||
| except Exception: | ||
| if i == 2: | ||
| raise | ||
| time.sleep(2**i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. Similar to other pull request, I think it might make sense to extract to a helper and add a comment explaining why this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hank95179 , I agree with @currantw here, especially given this is happening across multiple tests. Would you be able to add the helpers in one of your PRs and use them in the other.
Thanks.
Issue link
This Pull Request is linked to issue (URL): #4946
Description
This PR addresses the flakiness observed in
test_tls_with_self_signed_certificate_succeeds.In high-load environments (such as CI/CD), the TLS handshake process occasionally times out due to resource contention, causing
ClosingError: ... timed outfailures.Solution
Implemented a retry mechanism for client creation in
tests/async_tests/test_tls_certificates.py.tests/async_tests/test_tls_certificates.py.The test now attempts to establish the connection up to 3 times with an exponential backoff (2^i seconds) before failing.
Verification
Verified locally by simulating connection latency. The retry logic successfully allows the test to pass even when initial connection attempts fail due to transient timeouts.