Description
What happened:
Logs from the matrics-server pod show this repeatedly
E0410 22:04:01.247686 1 scraper.go:149] "Failed to scrape node" err="Get "https://10.124.4.238:10250/metrics/resource\": tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, not 10.124.4.238" node="fargate-ip-10-124-4-238.ap-southeast-1.compute.internal"
E0410 22:04:16.201141 1 scraper.go:149] "Failed to scrape node" err="Get "https://10.124.4.238:10250/metrics/resource\": tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, not 10.124.4.238" node="fargate-ip-10-124-4-238.ap-southeast-1.compute.internal"
E0410 22:04:31.201853 1 scraper.go:149] "Failed to scrape node" err="Get "https://10.124.4.238:10250/metrics/resource\": tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, not 10.124.4.238" node="fargate-ip-10-124-4-238.ap-southeast-1.compute.internal"
E0410 22:04:46.277913 1 scraper.go:149] "Failed to scrape node" err="Get "https://10.124.4.238:10250/metrics/resource\": tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, not 10.124.4.238" node="fargate-ip-10-124-4-238.ap-southeast-1.compute.internal"
What you expected to happen:
To be able to scrape itself.
Anything else we need to know?:
- Initially was using the metrics server that came with vpa. Errors similar to the above appears.
# metrics-server -- configuration options for the [metrics server Helm chart](https://github.com/kubernetes-sigs/metrics-server/tree/master/charts/metrics-server). See the projects [README.md](https://github.com/kubernetes-sigs/metrics-server/tree/master/charts/metrics-server#configuration) for all available options
metrics-server:
# metrics-server.enabled -- Whether or not the metrics server Helm chart should be installed
enabled: true
# CHANGE ABOVE from original value false
defaultArgs:
- --cert-dir=/tmp
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- But later switch to the metrics server that is directly installed from eks_blueprints_kubernetes_addons. Errors similar to the above appears.
enable_metrics_server = true
metrics_server = {
name = "metrics-server"
chart_version = "3.12.1"
repository = "https://kubernetes-sigs.github.io/metrics-server/"
namespace = "kube-system"
values = [templatefile("${path.module}/metrics-svr.yaml", {})]
}
- Tried to upgrade metrics server from version 0.6.x to 0.7.x. Errors similar to the above appears.
- Tried to use by pass the certificate check by passing in the ‘--kubelet-insecure-tls’
defaultArgs:
- --cert-dir=/tmp
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls
However, the follow errors appear.
E0410 22:13:28.928630 1 scraper.go:149] "Failed to scrape node" err="request failed, status: "403 Forbidden"" node="fargate-ip-10-124-4-186.ap-southeast-1.compute.internal"
E0410 22:13:43.827793 1 scraper.go:149] "Failed to scrape node" err="request failed, status: "403 Forbidden"" node="fargate-ip-10-124-4-186.ap-southeast-1.compute.internal"
Environment:
Kubernetes distribution EKS Fargate
Server Version: v1.27.11-eks-b9c9ed7
- Metrics Server manifest
spoiler for Metrics Server manifest:
spoiler for Metrics Server manifest:
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
meta.helm.sh/release-name: metrics-server
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2024-04-10T21:48:44Z"
labels:
app.kubernetes.io/instance: metrics-server
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: metrics-server
app.kubernetes.io/version: 0.7.1
helm.sh/chart: metrics-server-3.12.1
name: metrics-server
namespace: kube-system
resourceVersion: "1044967"
uid: bbd89fdf-d933-4fd3-9bfa-2c8351bc9159
apiVersion: v1
kind: Service
metadata:
annotations:
meta.helm.sh/release-name: metrics-server
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2024-04-10T21:48:44Z"
labels:
app.kubernetes.io/instance: metrics-server
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: metrics-server
app.kubernetes.io/version: 0.7.1
helm.sh/chart: metrics-server-3.12.1
name: metrics-server
namespace: kube-system
resourceVersion: "1044976"
uid: fe68eb2b-9ecf-4c57-996e-6836955f614c
spec:
clusterIP: 172.20.20.200
clusterIPs:
- 172.20.20.200
internalTrafficPolicy: Cluster
ipFamilies: - IPv4
ipFamilyPolicy: SingleStack
ports: - name: https
port: 443
protocol: TCP
targetPort: https
selector:
app.kubernetes.io/instance: metrics-server
app.kubernetes.io/name: metrics-server
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
meta.helm.sh/release-name: metrics-server
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2024-04-10T21:48:44Z"
generation: 3
labels:
app.kubernetes.io/instance: metrics-server
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: metrics-server
app.kubernetes.io/version: 0.7.1
helm.sh/chart: metrics-server-3.12.1
name: metrics-server
namespace: kube-system
resourceVersion: "1048455"
uid: 51c7e198-d10b-4ec4-b96d-69e151de778b
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: metrics-server
app.kubernetes.io/name: metrics-server
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: metrics-server
app.kubernetes.io/name: metrics-server
spec:
containers:
- args:
- --secure-port=10250
- --cert-dir=/tmp
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls
image: registry.k8s.io/metrics-server/metrics-server:v0.7.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: metrics-server
ports:
- containerPort: 10250
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
requests:
cpu: 100m
memory: 200Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp
name: tmp
dnsPolicy: ClusterFirst
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: metrics-server
serviceAccountName: metrics-server
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: tmp
status:
availableReplicas: 1
conditions:
-
lastTransitionTime: "2024-04-10T21:50:06Z"
lastUpdateTime: "2024-04-10T21:50:06Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available -
lastTransitionTime: "2024-04-10T21:48:44Z"
lastUpdateTime: "2024-04-10T22:13:47Z"
message: ReplicaSet "metrics-server-578bc9bf64" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 3
readyReplicas: 1
replicas: 1
updatedReplicas: 1 -
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
annotations:
meta.helm.sh/release-name: metrics-server
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2024-04-10T21:48:44Z"
labels:
app.kubernetes.io/instance: metrics-server
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: metrics-server
app.kubernetes.io/version: 0.7.1
helm.sh/chart: metrics-server-3.12.1
name: v1beta1.metrics.k8s.io
resourceVersion: "1048453"
uid: 84cc08c7-27bc-4a4e-a7b8-efcd7b428ea2
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
port: 443
version: v1beta1
versionPriority: 100
status:
conditions:- lastTransitionTime: "2024-04-10T21:48:44Z"
message: 'failing or missing response from https://10.124.4.186:10250/apis/metrics.k8s.io/v1beta1:
bad status from https://10.124.4.186:10250/apis/metrics.k8s.io/v1beta1: 404'
reason: FailedDiscoveryCheck
status: "False"
type: Available
- lastTransitionTime: "2024-04-10T21:48:44Z"
- Kubelet config:
spoiler for Kubelet config:
- Metrics server logs:
spoiler for Metrics Server logs:
- Status of Metrics API:
spolier for Status of Metrics API:
kubectl describe apiservice v1beta1.metrics.k8s.io
/kind bug