Open
Description
Part of #18551
We were experiencing a lot of obvious pain as humans when kubernetes/kubernetes#92937 was opened.
There are a number of theories as to why that pain was being experienced, and we're now acting based on some of those theories.
What we are lacking is:
- metrics that show that pain is being experienced
- metrics that prove the theories as to why the pain was being experienced
- metrics/reports that prove the action we are taking is having positive/negative impact on overall CI health
This issue is intended to cover brainstorming, exploring and implementing metrics / reports that help guide us in the right direction.
Some suggestions / questions I'm pulling up from below
- can we / does it make sense to implement an alert when nothing has merged into kubernetes/kubernetes for a while, and should have (non-empty tide pool)?
- can we / does it make sense to implement an alert when the kubernetes/kubernetes non-release-branch tide pool is above a certain threshold? what should that threshold be?
- can we identify which job runs were due to new commits vs.
/retest
spam?
Metadata
Metadata
Assignees
Labels
Issues or PRs related to deflaking kubernetes testsIssues or PRs related to prowCategorizes issue or PR as related to a new feature.Indicates that an issue or PR should not be auto-closed due to staleness.Important over the long term, but may not be staffed and/or may need multiple releases to complete.Categorizes an issue or PR as relevant to SIG K8s Infra.Categorizes an issue or PR as relevant to SIG Testing.