Stabilize test suite: fix flaky timing tests, add rerun support, and handle Windows resource leaks#11992
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #11992 +/- ##
==========================================
+ Coverage 98.73% 98.76% +0.02%
==========================================
Files 127 127
Lines 44667 44675 +8
Branches 2372 2371 -1
==========================================
+ Hits 44102 44122 +20
+ Misses 402 393 -9
+ Partials 163 160 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
I think the expectation was that import time would improve again in later versions. I think flaky failures today are mostly caused by regressions (e.g. idna: kjd/idna#188). |
6fae85d to
2cda92f
Compare
|
The CI failure on I've updated it. Let me know if this looks good |
be1fa7f to
553f63e
Compare
webknjaz
left a comment
There was a problem hiding this comment.
Can we try using https://pypi.org/p/pytest-rerunfailures? It should be more generic.
I think we can just double that to 20ms. It's meant to highlight an issue as an order of magnitude, so I clearly misjudged the performance of Mac OS. |
|
Thanks @webknjaz for the suggesting pytest-rerunfailures. |
|
As I mentioned before, there are two more tests evaluating the 10ms threshold. One of them just failed in the last commit. Job Test (3.14, windows, false) At this point it is easy to mark them to rerun. |
856f3f8 to
7c338ff
Compare
|
While trying to address the unclosed socket ResourceWarnings that raised in Windows test jobs, I noticed the |
webknjaz
left a comment
There was a problem hiding this comment.
I think it's good but let me know if you'd like to drop that helper function before merging.
Urgh.. looks like my browser cached the previous diff and I commented on the wrong thing.
- Remove RerunThresholdParams and rerun_adjusted_threshold fixture - Set fixed thresholds: 400ms/300ms for import test, 80ms for regex tests - Remove pytest-rerunfailures dependency - Remove @pytest.mark.flaky decorators from performance tests The new fixed thresholds are set to the previous maximum values (after 3 reruns), providing generous headroom for CI variability while eliminating complexity.
…0/3.11 and refactor `asyncio.gather` calls to use generator expressions.
507b274 to
f781696
Compare
Co-authored-by: Sam Bull <aa6bs0@sambull.org>
… example.com The test was connecting to real https://example.com through a local proxy, but client_ssl_ctx (built from ssl.create_default_context + trustme) depends on the system trust store for public CA verification. On macOS CI, the trust store may not include the required public CAs, causing CERTIFICATE_VERIFY_FAILED. Replace the external request with a local TestServer using trustme-issued certificates, matching the pattern of test_secure_https_proxy_absolute_path.
|
The Fixed by replacing the external request with a local TestServer using trustme-issued certificates, matching the pattern of |
… in import test.
|
Analysis of 20 PRs shows that 16 of them had failures that are addressed here. Failure counts addressed: test_uvloop_secure_https_proxy: 5 failures #12069: test_uvloop_secure_https_proxy |
That's not flaky, either the certificate changed or the runners changed. |
|
I'm happy with this, so I'm going to merge this to get the CI unblocked again. |
Backport to 3.13: 💔 cherry-picking failed — conflicts found❌ Failed to cleanly apply 32924e8 on top of patchback/backports/3.13/32924e8fa4c9085dcbaf41b3b3c27fc773bfa213/pr-11992 Backporting merged PR #11992 into master
🤖 @patchback |
Backport to 3.14: 💔 cherry-picking failed — conflicts found❌ Failed to cleanly apply 32924e8 on top of patchback/backports/3.14/32924e8fa4c9085dcbaf41b3b3c27fc773bfa213/pr-11992 Backporting merged PR #11992 into master
🤖 @patchback |
|
Thank you 👏 |
Description
What do these changes do?
Timing-based performance tests:
test_import_time(test_imports.py): Simplify threshold logic — replace per-version/xdist/CI lookup with a singlesys.version_info >= (3, 12)check (200ms / 300ms) to reduce flakiness.test_regex_performance(test_client_middleware_digest_auth.py): Raise threshold from 10ms to 80ms, add descriptive assertion message with elapsed time.test_forwarded_re_performance(test_web_request.py): Raise threshold from 10ms to 80ms, add descriptive assertion message with elapsed time.test_cookie_pattern_performance(test_cookie_helpers.py): Raise threshold from 10ms to 80ms, add descriptive assertion message with elapsed time.Flaky test fix:
test_uvloop_secure_https_proxy(test_proxy_functional.py): Replace connection to real example.com with a local TestServer using trustme certificates. Fixes CERTIFICATE_VERIFY_FAILED on macOS CI where the system trust store may lack public CAs.Test infrastructure:
pytest_configure(tests/conftest.py): Broaden Windows socket warning filter in Python < 3.12.make_client_requestfixture (tests/conftest.py): Fix session leak by tracking multiple requests/sessions in lists, useasyncio.gatherfor parallel cleanup.Are there changes in behavior for the user?
No user-facing changes. This only affects CI test behavior.
Is it a substantial burden for the maintainers to support this?
No — this actually reduces maintenance burden by making the test future-proof.
Related issue number
Fixes flaky test failures in PR #11990.
Checklist
CONTRIBUTORS.txtCHANGES/folder