Skip to content

Missing CPU and Memory Metrics for Pods #9862

Closed
@vadim-kubasov

Description

@vadim-kubasov

What happened?

When using the Kubernetes Dashboard, CPU and Memory metrics for individual pods are not displayed, as shown below:

Image

However, node-level CPU and Memory metrics are available:

Image

The Metrics Server appears to be functioning correctly, as metrics for pods are available when running kubectl top pods:

$ k top pods -n mgmt
NAME                                                        CPU(cores)   MEMORY(bytes)
cloudflared-6bcdd996f9-s4wlw                                3m           18Mi
cloudflared-6bcdd996f9-tbf9l                                3m           21Mi
cluster-autoscaler-aws-cluster-autoscaler-9d7db4f54-vmw2n   3m           122Mi
configserver-7fbbff4654-tjjh9                               2m           287Mi
kubernetes-dashboard-api-8497678dd5-n8m6j                   1m           42Mi
kubernetes-dashboard-auth-74447bb8b8-7kg6g                  1m           13Mi
kubernetes-dashboard-kong-5df5d7986c-jxpxh                  2m           132Mi
kubernetes-dashboard-metrics-scraper-7cc8997c6c-vf9dq       1m           17Mi

Metrics-server configuration:

# metrics-server.tf
resource "helm_release" "metrics-server" {
  name        = "metrics-server"
  repository  = "https://kubernetes-sigs.github.io/metrics-server"
  chart       = "metrics-server"
  version     = "3.12.2"
  namespace   = "mgmt"
  max_history = "3"
  values = [
    file("values.yaml")
  ]

  set {
    name  = "replicas"
    value = "1"
  }
  set {
    name  = "containerPort"
    value = "8443"
  }
}
# values.yaml
resources:
  requests:
    cpu: 100m
    memory: 200Mi
  limits:
    cpu: 100m
    memory: 250Mi

args:
  - --kubelet-insecure-tls

Kubernetes-dashboard configuration:

kubernetes-dashboard.tf
resource "helm_release" "kubernetes-dashboard" {
  name        = "kubernetes-dashboard"
  repository  = "https://kubernetes.github.io/dashboard"
  chart       = "kubernetes-dashboard"
  version     = "7.10.0"
  namespace   = "mgmt"
  max_history = "3"
  values = [
    templatefile("templates/values.yaml", {
      tier = var.tier
    })
  ]
  set {
    name  = "web.containers.args"
    value = "{--system-banner=EKS-${upper(var.tier)}}"
  }
}
# values.yaml
app:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
  ingress:
    enabled: "true"
    labels:
      app: "kubernetes-dashboard"
    annotations:
      nginx.ingress.kubernetes.io/ssl-redirect: "true"
      nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
      nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
    ingressClassName: "nginx"
    hosts:
      - eks-${tier}-board.int.example.com
    path: "/"
  security:
    podDisruptionBudget:
      enabled: "true"
      minAvailable: "1"

web:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

auth:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

api:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
  containers:
    resources:
      limits:
        memory: 600Mi

metricsScraper:
  enabled: "true"
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

kong:
  resources:
    limits:
      cpu: 700m
      memory: 350Mi
    requests:
      cpu: 600m
      memory: 250Mi

Metrics-server logs:

I0116 15:22:06.791552 1 serving.go:374] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0116 15:22:09.873654 1 handler.go:275] Adding GroupVersion metrics.k8s.io v1beta1 to ResourceManager
I0116 15:22:10.269849 1 trace.go:236] Trace[59397922]: "DeltaFIFO Pop Process" ID:stagewhite/fox-799b7c74dd-zp4bn,Depth:88,Reason:slow event handlers blocking the queue (16-Jan-2025 15:22:10.088) (total time: 181ms):
Trace[59397922]: [181.325995ms] [181.325995ms] END
I0116 15:22:10.566775 1 secure_serving.go:213] Serving securely on [::]:8443
I0116 15:22:10.570463 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0116 15:22:10.685489 1 shared_informer.go:311] Waiting for caches to sync for RequestHeaderAuthRequestController
I0116 15:22:10.570496 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0116 15:22:10.570522 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0116 15:22:10.570538 1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key"
I0116 15:22:10.570553 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0116 15:22:10.686482 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0116 15:22:10.686645 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0116 15:22:11.176094 1 shared_informer.go:318] Caches are synced for RequestHeaderAuthRequestController
I0116 15:22:11.268544 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0116 15:22:11.269013 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file

Kubernetes-dashboard logs:

# kubernetes-dashboard-api
W0116 15:12:04.740811 1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
W0116 15:12:04.834908 1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
W0116 15:12:04.839366 1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
W0116 15:12:04.945440 1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
E0116 15:14:11.569385 1 manager.go:96] Metric client health check failed: the server is currently unable to handle the request (get services kubernetes-dashboard-metrics-scraper). Retrying in 30 seconds.
E0116 15:16:51.317408 1 manager.go:96] Metric client health check failed: the server is currently unable to handle the request (get services kubernetes-dashboard-metrics-scraper). Retrying in 30 seconds.
E0116 15:19:31.059224 1 manager.go:96] Metric client health check failed: the server is currently unable to handle the request (get services kubernetes-dashboard-metrics-scraper). Retrying in 30 seconds.
E0116 15:22:10.801922 1 manager.go:96] Metric client health check failed: the server is currently unable to handle the request (get services kubernetes-dashboard-metrics-scraper). Retrying in 30 seconds.
E0116 15:24:50.545265 1 manager.go:96] Metric client health check failed: the server is currently unable to handle the request (get services kubernetes-dashboard-metrics-scraper). Retrying in 30 seconds.
E0116 15:27:30.289899 1 manager.go:96] Metric client health check failed: the server is currently unable to handle the request (get services kubernetes-dashboard-metrics-scraper). Retrying in 30 seconds.
W0116 15:29:37.734179 1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
W0116 15:29:37.736613 1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
W0116 15:29:37.749196 1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
# kubernetes-dashboard-metrics-scraper
I0116 15:29:14.780490 1 main.go:145] Database updated: 7 nodes, 265 pods
172.18.152.189 - - [16/Jan/2025:15:29:22 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.31+"
172.18.152.189 - - [16/Jan/2025:15:29:32 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.31+"
172.18.152.189 - - [16/Jan/2025:15:29:42 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.31+"
172.18.152.189 - - [16/Jan/2025:15:29:52 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.31+"
172.18.152.189 - - [16/Jan/2025:15:30:02 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.31+"
172.18.152.189 - - [16/Jan/2025:15:30:12 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.31+"
I0116 15:30:14.781186 1 main.go:145] Database updated: 7 nodes, 265 pods
172.18.152.189 - - [16/Jan/2025:15:30:22 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.31+"
172.18.152.189 - - [16/Jan/2025:15:30:32 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.31+"
172.18.152.189 - - [16/Jan/2025:15:30:42 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.31+"
172.18.152.189 - - [16/Jan/2025:15:30:52 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.31+"
172.18.152.189 - - [16/Jan/2025:15:31:02 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.31+"
172.18.152.189 - - [16/Jan/2025:15:31:12 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.31+"
I0116 15:31:14.787647 1 main.go:145] Database updated: 7 nodes, 265 pods
172.18.152.189 - - [16/Jan/2025:15:31:22 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.31+"
# kubernetes-dashboard-auth
W0116 15:30:09.570625 1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
W0116 15:30:39.586608 1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
W0116 15:31:13.593571 1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
# kubernetes-dashboard-kong
172.18.154.194 - - [16/Jan/2025:15:29:13 +0000] "GET /api/v1/me HTTP/1.1" 200 22 "https://eks-stage-board.int.example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36" kong_request_id: "ecdf976aac159e6c989fcf5618df987a"
172.18.156.124 - - [16/Jan/2025:15:29:37 +0000] "GET /api/v1/namespace HTTP/1.1" 200 1514 "https://eks-stage-board.int.example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36" kong_request_id: "94cf9a29e560283053c43c06f82d1169"
172.18.154.194 - - [16/Jan/2025:15:29:38 +0000] "GET /api/v1/node?itemsPerPage=10&page=1&sortBy=d,creationTimestamp HTTP/1.1" 200 2444 "https://eks-stage-board.int.example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36" kong_request_id: "67b0d5165368851a3e9eb82ff94e2a51"
172.18.154.194 - - [16/Jan/2025:15:29:39 +0000] "GET /api/v1/me HTTP/1.1" 200 22 "https://eks-stage-board.int.example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36" kong_request_id: "5033df3c9fe6d21748a7c948d824660c"
172.18.149.190 - - [16/Jan/2025:15:29:40 +0000] "GET /api/v1/namespace HTTP/1.1" 200 1514 "https://eks-stage-board.int.example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36" kong_request_id: "a5a1bc58f97201483f4bf03cf904392d"
172.18.156.124 - - [16/Jan/2025:15:29:41 +0000] "GET /api/v1/node?itemsPerPage=10&page=1&sortBy=d,creationTimestamp HTTP/1.1" 200 2444 "https://eks-stage-board.int.example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36" kong_request_id: "d22102231219a70277fccf67fcc80324"
172.18.156.124 - - [16/Jan/2025:15:30:09 +0000] "GET /api/v1/me HTTP/1.1" 200 22 "https://eks-stage-board.int.example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36" kong_request_id: "2f374f3909409d4cbd4885325b8ae473"
172.18.149.190 - - [16/Jan/2025:15:30:39 +0000] "GET /api/v1/me HTTP/1.1" 200 22 "https://eks-stage-board.int.example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36" kong_request_id: "a14b9644dfd0b131d5e9b1688d2b2e2f"
172.18.149.190 - - [16/Jan/2025:15:31:13 +0000] "GET /api/v1/me HTTP/1.1" 200 22 "https://eks-stage-board.int.example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36" kong_request_id: "8be64f1e6d89813fa50ac72920b6ea47"
# kubernetes-dashboard-web
I0115 13:11:20.445029 1 main.go:37] "Starting Kubernetes Dashboard Web" version="1.6.0"
I0115 13:11:20.445092 1 init.go:48] Using in-cluster config
I0115 13:11:20.445320 1 main.go:57] "Listening and serving insecurely on" address="0.0.0.0:8000"

What did you expect to happen?

Pod-level CPU and Memory metrics should be displayed in the Kubernetes Dashboard.

Image

How can we reproduce it (as minimally and precisely as possible)?

  1. Deploy Metrics Server and Kubernetes Dashboard using the configurations provided above.
  2. Access the Kubernetes Dashboard and check pod-level metrics.
  3. Observe that CPU and Memory metrics for pods are missing, while node metrics are visible.

Anything else we need to know?

Metrics Server Version: 3.12.2
Cluster Environment: AWS EKS

What browsers are you seeing the problem on?

No response

Kubernetes Dashboard version

7.10.0

Kubernetes version

1.31

Dev environment

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/supportCategorizes issue or PR as a support question.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions