Skip to content

[kube-prometheus-stack] API server metrics broken after upgrade #5274

Open
@krankkkk

Description

@krankkkk

Describe the bug a clear and concise description of what the bug is.

We upgraded prometheus-kube-stack from 62.6.0 to 68.4.5 and observed the uptime metrics (apiserver_request:availability30d) of the api-server to be completely implausible.

What's your helm version?

Deployed via ArgoCD, which internally uses 3.15.4

What's your kubectl version?

Irrelevant

Which chart?

prometheus-kube-stack

What's the chart version?

68.4.5

What happened?

We observed the metrics of apiserver_request:availability30d to go from 99.999 % to way beyond 100% i.e. apiserver_request:availability30d{verb="all"} is currently at 1.6425321904704488 on one cluster and 2.225346243637766 on another.

If we take a look via Prometheus UI we can spot the exact time we initiated the update.

Image

If we rollback the update, we can see the metrics going back to normal.

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

No command necessary

Anything else we need to know?

ClusterVersion is v1.29.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions