Commit e9bc36e
[autorevert] implement autorevert and fix detection logic (#6983)
### Summary
- Implemented revert detection/recording
- Implemented failure-only rule matching in the autorevert detector to
prevent “success” jobs with a classification label from contaminating
pattern detection
- Added a unit test
### Bug Fixed
- Cause: The detector previously matched on `classification_rule`
regardless of
job `conclusion`. Baseline commit `33ec6e3` had multiple “success”
shards labele
d with `rule='pytest failure'`, which the detector misread as “older
commit alre
ady has the same failure,” suppressing the pattern for
`bbc0df1`/`4fd5fab`.
- Fix: Require `conclusion == 'failure'` wherever the detector compares
rules (b
oth for newer commit confirmation and older baseline exclusion). This
prevents n
oise from success+rule rows and correctly flags commit-caused failures
like the
ROCm case.
### Testing
<details>
<summary>python -m pytorch_auto_revert autorevert-checker rocm --hours
82 --do-restart --dry-run</summary>
```
python -m pytorch_auto_revert autorevert-checker rocm --hours 82 --do-restart --dry-run
Fetching workflow data for 1 workflows since 2025-08-04T08:56:25.851470...
Found 161 commits with job data for workflow 'rocm'
✓ 3 AUTOREVERT PATTERNS DETECTED
Pattern #1:
Failure rule: 'pytest failure'
Recent commits with failure: bdb07a2b 8085edc8
Older commit without failure: 41081276
✗ NOT REVERTED: 8085edc8f9c98f670f585586b4286a942927537a was not reverted
⟳ DRY RUN: Would restart rocm for 8085edc8
⟳ DRY RUN: Would restart rocm for 41081276
Pattern #2:
Failure rule: 'pytest failure'
Recent commits with failure: 908c5cc4 b6c53383
Older commit without failure: 33ec6e3e
✗ NOT REVERTED: b6c53383fe2f29e6ed35430e90867dbeb8980d42 was not reverted
⟳ DRY RUN: Would restart rocm for b6c53383
⟳ DRY RUN: Would restart rocm for 33ec6e3e
Pattern #3:
Failure rule: 'pytest failure'
Recent commits with failure: 4fd5fabe bbc0df10
Older commit without failure: efc4b460
✓ REVERTED (nosignal): bbc0df1094b5a4dcd2cce83f8402127b07913231 was reverted by 41081276 after 18.5 hours
==================================================
SUMMARY STATISTICS
==================================================
Workflow(s): rocm
Timeframe: 82 hours
Commits checked: 161
Auto revert patterns detected: 3
Actual reverts inside auto revert patterns detected (precision): 1 (33.3%)
Total revert commits in period: 9
Revert categories:
nosignal: 5 (55.6%)
ignoredsignal: 2 (22.2%)
ghfirst: 2 (22.2%)
Total reverts excluding ghfirst: 7
Reverts (excluding ghfirst) that dont match any auto revert pattern detected (recall): 6 (85.7%)
Per workflow precision:
rocm: 1 reverts out of 3 patterns (33.3%) [excluding ghfirst: 1 (33.3%)]
Reverted patterns:
- pytest failure: bbc0df10 (nosignal)
Restarted workflows: 4
- rocm for 8085edc8
- rocm for 41081276
- rocm for b6c53383
- rocm for 33ec6e3e
```
</details>
the actual culprit was correctly identified:
```
Pattern #7:
Failure rule: 'pytest failure'
Recent commits with failure: 4fd5fabe bbc0df10
Older commit without failure: efc4b460
✓ REVERTED (nosignal): bbc0df1094b5a4dcd2cce83f8402127b07913231 was
reverted by 41081276 after 18.5 hours
```
there are multiple patterns detected, because the failure was jumping across **workflows**: rocm and rocm-mi300
---------
Co-authored-by: Jean Schmidt <[email protected]>1 parent bcc20e4 commit e9bc36e
File tree
6 files changed
+401
-83
lines changed- aws/lambda/pytorch-auto-revert
- pytorch_auto_revert
- testers
- tests
6 files changed
+401
-83
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
24 | 28 | | |
25 | 29 | | |
26 | | - | |
| 30 | + | |
27 | 31 | | |
28 | 32 | | |
29 | 33 | | |
| |||
Lines changed: 12 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
65 | 70 | | |
66 | 71 | | |
67 | 72 | | |
| |||
91 | 96 | | |
92 | 97 | | |
93 | 98 | | |
94 | | - | |
| 99 | + | |
95 | 100 | | |
96 | | - | |
| 101 | + | |
97 | 102 | | |
98 | 103 | | |
99 | 104 | | |
| |||
173 | 178 | | |
174 | 179 | | |
175 | 180 | | |
| 181 | + | |
| 182 | + | |
176 | 183 | | |
177 | 184 | | |
178 | | - | |
179 | | - | |
| 185 | + | |
180 | 186 | | |
181 | 187 | | |
182 | 188 | | |
183 | 189 | | |
184 | 190 | | |
| 191 | + | |
| 192 | + | |
185 | 193 | | |
186 | 194 | | |
187 | | - | |
188 | 195 | | |
189 | 196 | | |
190 | 197 | | |
| |||
0 commit comments