-
Notifications
You must be signed in to change notification settings - Fork 127
[Python] Fix flaky test test_tls_with_multiple_certificates_succeeds #4950
#5055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f09406e to
004a136
Compare
…ation Signed-off-by: hank95179 <[email protected]>
004a136 to
7c8eb24
Compare
|
In your description under I'm unsure if that is what was intended but an exponential backoff is usually the norm for retry. Ideally, its a not that high of an exponential, because if there were more retires it would take a very long time. But three retries is fine here with base 2 exponential. Please update your description, or adjust your implementation. Otherwise, it generally looks good. |
|
@xShinnRyuu Thank you! I've updated the description. |
xShinnRyuu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with one suggestion. Thanks for helping with this!
| advanced_config=cluster_advanced_config, | ||
| ) | ||
| client = await GlideClusterClient.create(cluster_config) | ||
| for i in range(3): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. Might be worth extracting to a helper and adding a comment explaining why this retry mechanism is needed.
Issue link
This Pull Request is linked to issue (URL): #4950
Description
This PR addresses the flakiness observed in
test_tls_with_multiple_certificates_succeeds.Similar to issue #4946, the TLS handshake process occasionally times out in high-load environments (such as CI/CD) due to resource contention, leading to
ClosingError: ... timed outfailures.Solution
Implemented a retry mechanism for client creation in
tests/async_tests/test_tls_certificates.py.tests/async_tests/test_tls_certificates.py.The test now attempts to establish the connection up to 3 times with an exponential backoff (2^i seconds) before failing.
Verification
Verified that the retry mechanism correctly handles transient connection failures, allowing the test to pass in environments with varying latency.