Skip to content

[kube-prometheus-stack] Dashboards: container resource requests and limits should be displayed by container #6378

@rdxmb

Description

@rdxmb

Describe the bug a clear and concise description of what the bug is.

The grafana dashboard for Pod Resources does not differentiate between resource limits/requests for different containers in the graph, but only in the table:

Image

What's your helm version?

3.11.0

What's your kubectl version?

1.31.4

Which chart?

kube-prometheus-stack

What's the chart version?

77.3.0

What happened?

Why I think this is a bug: We sometimes had containers deleted by kubernetes OOM-killer, while the container limits within that dashboard did not show that there is a problem. After adding dedicated limit graphs per container, the problem became visible:

Image

(unfortunately I do not have the corresponding graph before splitting it up for this example anymore)

What you expected to happen?

Requests and limts should be displayed dedicated per container, e.g. like this:

Image

How to reproduce it?

deploy a pod like this

apiVersion: v1
kind: Pod
metadata:
  name: pod-resources-demo
  namespace: default
spec:
  containers:
  - name: nginx
    image: nginx
    resources:
      requests:
        cpu: "0.1"
        memory: "100Mi"    
      limits:
        cpu: "0.2"
        memory: "200Mi"
  - name: fedora
    image: fedora
    resources:
      requests:
        cpu: "0.3"
        memory: "300Mi"    
      limits:
        cpu: "0.4"
        memory: "400Mi"
    command:
    - sleep
    - inf

Into your cluster and have a look at the dashboard 6581e46e4e5c7ba40a07646395ef7b23/kubernetes-compute-resources-pod from kube-prometheus-stack.

If you want to see the problem, let the nginx pod use more than 200Mi memory. You will see that the container will be oom-killed, while the limit in the dashboard-graph (600Mi) is still higher than the container usage shown in the graph (max 200Mi)

Enter the changed values of values.yaml?

NONE

Enter the command that you execute and failing/misfunctioning.

(this is not important for this bug report)

Anything else we need to know?

I have just added

  • a by (container) to the queries
  • {{container}} requests instead of requests to the legends
  • {{container}} limits instead of limits to the legends

This should be a small fix. However, for me it is not quite clear where to make this change within this repo, as there is too much magic 🪄 for me to generate the dashboards ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions