feat: implement flake processing using timeseries models #1140

joseph-sentry · 2025-03-13T17:59:16Z

remove code related to old upload states
create implementation of flake processing that consumes data from timescale
process flakes uses impl_type

idea is that we'll start with impl_type both for a while which will persist flakes to the current db table and the new one, then at some point we will detect flaky tests in the finisher by checking the new flakes table

seer-by-sentry · 2025-03-13T18:00:56Z

🚨 Sentry detected 1 potential issue in your recent changes 🚨

Line 117, services/test_analytics/ta_process_flakes.py: Item "str" of "Any | str" has no attribute "decode"
- Improperly decoded commit IDs from Redis can cause database queries to fail or take too long, leading to timeouts.
- This may cause issues similar to: #6431559854

_{Did you find this useful? React with a 👍 or 👎}

Swatinem · 2025-03-14T08:28:18Z

services/test_analytics/ta_process_flakes.py

+def handle_pass(curr_flakes: dict[bytes, Flake], test_id: bytes):
+    curr_flakes[test_id].recent_passes_count += 1
+    curr_flakes[test_id].count += 1
+    if curr_flakes[test_id].recent_passes_count == 30:


moving this magic constant to the top level would make sense. and an explanation of "after X passes in a row, the test is not marked as flaky anymore"

services/test_analytics/ta_process_flakes.py

Swatinem · 2025-03-14T08:36:53Z

services/test_analytics/tests/test_ta_process_flakes.py

+                    {"test_id": "test1", "outcome": "pass"},
+                    {"test_id": "test1", "outcome": "failure"},


testing with 2x pass would trigger the error I mentioned above, as you clear the test from the current flakes, and on the second iteration try to access that yet again.

Swatinem · 2025-03-14T08:38:48Z

tasks/process_flakes.py

@@ -78,8 +80,13 @@ def run_impl(
            extra=dict(repoid=repo_id, commit=commit_id),
        )

+        if impl_type == "new" or impl_type == "both":
+            process_flakes_for_repo(repo_id)


in case you use both, you are locking twice in a row. maybe if you move the locking logic out of the function, and the invocation into the lock here, that could be avoided.

i think this is necessary because they're locking different locks and reading from different keys

there's an edge case where a commit is leftover if we lock only once:

call Task 1 with commit A, old_key = [A], new key = [B] Task 1: takes repo lock Task 1: completes new invocation, old key = [A], new key = [] call Task 2 with commit B, old key = [A, B], new key = [B] Task 2: fails to take lock and just drops Task 1: completes old invocation, old key = [], new key = [B]

B is left over and has to wait for another invocation of process flakes to get processed

codecov · 2025-03-24T19:18:14Z

Codecov Report

Attention: Patch coverage is 96.20853% with 8 lines in your changes missing coverage. Please review.

Project coverage is 97.71%. Comparing base (c9ed88b) to head (76a8a8e).
Report is 4 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
services/test_analytics/ta_process_flakes.py	92.85%	5 Missing ⚠️
tasks/process_flakes.py	57.14%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1140      +/-   ##
==========================================
- Coverage   97.72%   97.71%   -0.01%     
==========================================
  Files         449      451       +2     
  Lines       36866    37036     +170     
==========================================
+ Hits        36028    36191     +163     
- Misses        838      845       +7

Flag	Coverage Δ
integration	`42.81% <41.23%> (-0.08%)`	⬇️
unit	`90.51% <96.20%> (+0.08%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

codecov-notifications · 2025-03-24T19:18:18Z

Codecov Report

Attention: Patch coverage is 96.20853% with 8 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
services/test_analytics/ta_process_flakes.py	92.85%	5 Missing ⚠️
tasks/process_flakes.py	57.14%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

joseph-sentry requested a review from a team March 13, 2025 19:04

Swatinem reviewed Mar 14, 2025

View reviewed changes

joseph-sentry force-pushed the joseph/flake-proc branch from 85165e6 to 2742228 Compare March 24, 2025 19:11

joseph-sentry requested a review from Swatinem March 25, 2025 14:52

feat: implement flake processing using timeseries models

76a8a8e

joseph-sentry force-pushed the joseph/flake-proc branch from 2742228 to 76a8a8e Compare March 25, 2025 15:11

Swatinem approved these changes Mar 25, 2025

View reviewed changes

joseph-sentry added this pull request to the merge queue Mar 25, 2025

Merged via the queue into main with commit ef132cd Mar 25, 2025
21 of 29 checks passed

joseph-sentry deleted the joseph/flake-proc branch March 25, 2025 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement flake processing using timeseries models #1140

feat: implement flake processing using timeseries models #1140

Uh oh!

joseph-sentry commented Mar 13, 2025

Uh oh!

seer-by-sentry bot commented Mar 13, 2025 •

edited

Loading

Uh oh!

Swatinem Mar 14, 2025

Uh oh!

Uh oh!

Swatinem Mar 14, 2025

Uh oh!

Swatinem Mar 14, 2025

Uh oh!

joseph-sentry Mar 24, 2025

Uh oh!

codecov bot commented Mar 24, 2025 •

edited

Loading

Uh oh!

codecov-notifications bot commented Mar 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

		{"test_id": "test1", "outcome": "pass"},
		{"test_id": "test1", "outcome": "failure"},

feat: implement flake processing using timeseries models #1140

feat: implement flake processing using timeseries models #1140

Uh oh!

Conversation

joseph-sentry commented Mar 13, 2025

Uh oh!

seer-by-sentry bot commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚨 Sentry detected 1 potential issue in your recent changes 🚨

Uh oh!

Swatinem Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Swatinem Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

Swatinem Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

joseph-sentry Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov-notifications bot commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

seer-by-sentry bot commented Mar 13, 2025 •

edited

Loading

codecov bot commented Mar 24, 2025 •

edited

Loading

codecov-notifications bot commented Mar 24, 2025 •

edited

Loading