Skip to content

Kubernetes CI Policy: define metrics/reports that allow us to track whether the situation is getting better #18785

Open
@spiffxp

Description

@spiffxp

Part of #18551

We were experiencing a lot of obvious pain as humans when kubernetes/kubernetes#92937 was opened.

There are a number of theories as to why that pain was being experienced, and we're now acting based on some of those theories.

What we are lacking is:

  • metrics that show that pain is being experienced
  • metrics that prove the theories as to why the pain was being experienced
  • metrics/reports that prove the action we are taking is having positive/negative impact on overall CI health

This issue is intended to cover brainstorming, exploring and implementing metrics / reports that help guide us in the right direction.

Some suggestions / questions I'm pulling up from below

  • can we / does it make sense to implement an alert when nothing has merged into kubernetes/kubernetes for a while, and should have (non-empty tide pool)?
  • can we / does it make sense to implement an alert when the kubernetes/kubernetes non-release-branch tide pool is above a certain threshold? what should that threshold be?
  • can we identify which job runs were due to new commits vs. /retest spam?

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/deflakeIssues or PRs related to deflaking kubernetes testsarea/jobsarea/metricsarea/prowIssues or PRs related to prowkind/featureCategorizes issue or PR as related to a new feature.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.sig/k8s-infraCategorizes an issue or PR as relevant to SIG K8s Infra.sig/testingCategorizes an issue or PR as relevant to SIG Testing.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions