Enhance crashed cases handling in nightly #3047
Open
+232
−125
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
In the nightly pipeline, several model tests (both passing and xfail) intermittently crash. However, when these jobs are retriggered manually, the crashed tests often pass on the second attempt. When multiple nightly jobs contain crashed cases, each job must currently be retriggered individually, which is time-consuming and inefficient.
What This PR Introduces
After collecting the crashed cases, a separate job uses this information to rerun only the crashed test cases, improving CI efficiency and avoiding unnecessary full-job retriggers.
Note
Tested and verified the feature in both nightly and model analysis pipeline