Kubernetes CI Policy: define metrics/reports that allow us to track whether the situation is getting better

Part of https://github.com/kubernetes/test-infra/issues/18551

We were experiencing a lot of obvious pain as humans when https://github.com/kubernetes/kubernetes/issues/92937 was opened.

There are a number of theories as to why that pain was being experienced, and we're now acting based on some of those theories.

What we are lacking is:
- metrics that show that pain is being experienced
- metrics that prove the theories as to why the pain was being experienced
- metrics/reports that prove the action we are taking is having positive/negative impact on overall CI health

This issue is intended to cover brainstorming, exploring and implementing metrics / reports that help guide us in the right direction.

Some suggestions / questions I'm pulling up from below
- [ ] can we / does it make sense to implement an alert when nothing has merged into kubernetes/kubernetes for a while, and should have (non-empty tide pool)?
- [ ] can we / does it make sense to implement an alert when the kubernetes/kubernetes non-release-branch tide pool is above a certain threshold? what should that threshold be?
- [ ] can we identify which job runs were due to new commits vs. `/retest` spam?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes CI Policy: define metrics/reports that allow us to track whether the situation is getting better #18785

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Kubernetes CI Policy: define metrics/reports that allow us to track whether the situation is getting better #18785

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions