This directory contains the configuration for deploying the Prometheus monitoring stack (kube-prometheus-stack) on Kubernetes.
- Access to the Kubernetes cluster
- Helm installed
- kubectl configured to access the cluster
- Traefik ingress controller configured (for IngressRoute)
kubectl create namespace monitoringhelm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo updatehelm install my-kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--version 72.3.0 \
--namespace monitoring \
--values values.yamlkubectl wait --for=condition=ready pod \
-l app.kubernetes.io/name=prometheus-operator \
-n monitoring \
--timeout=300s
kubectl wait --for=condition=ready pod \
-l app=prometheus,prometheus=my-kube-prometheus-stack-prometheus \
-n monitoring \
--timeout=300sApply the patch to configure Prometheus alertmanager integration:
kubectl patch prometheus prometheus-stack-kube-prom-prometheus -n monitoring --type='merge' \
-p='{"spec":{"alerting":{"alertmanagers":[{"apiVersion":"v2","namespace":"monitoring","name":"prometheus-stack-kube-prom-alertmanager","port":"http-web","pathPrefix":"/alertmanager"}]}}}'Note: Adjust the Prometheus resource name according to your installation. If using prometheus-patch.yaml, verify the resource name matches your actual Prometheus instance.
Ensure the traefik-dashboard-cert secret exists in the monitoring namespace (see SSL/TLS Certificate section).
Apply the IngressRoute to expose Prometheus, Grafana, and Alertmanager through Traefik:
kubectl apply -f ingressroute.yamlNote: The ingressroute.yaml uses service names that match your installation:
prometheus-stack-kube-prom-prometheusprometheus-stack-grafanaprometheus-stack-kube-prom-alertmanager
kubectl get secret prometheus-stack-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 -d
echoCheck that all pods are running:
kubectl get pods -n monitoringAccess the services:
- Prometheus: https://traefik.mykubernetes.com/prometheus
- Grafana: https://traefik.mykubernetes.com/grafana (user:
admin, password: from step 7) - Alertmanager: https://traefik.mykubernetes.com/alertmanager
The stack uses local-path storage class (default in k3s):
- Prometheus: 2Gi storage
- Alertmanager: 1Gi storage
Resource limits and requests are configured in values.yaml:
- Prometheus: 1Gi memory, 1 CPU
- Alertmanager: 512Mi memory, 500m CPU
- Grafana: 512Mi memory, 500m CPU
The IngressRoute uses the traefik-dashboard-cert secret for TLS encryption. If the certificate is missing or expired, follow these steps:
kubectl get secret traefik-dashboard-cert -n monitoringIf the certificate is missing or expired, generate a new one:
cd ../certs
./certificate.shThis will create the traefik-dashboard-cert secret in the traefik namespace.
The IngressRoute needs the certificate in the monitoring namespace:
kubectl get secret traefik-dashboard-cert -n traefik -o yaml | \
sed 's/namespace: traefik/namespace: monitoring/' | \
sed '/resourceVersion:/d' | \
sed '/uid:/d' | \
kubectl apply -f -Check certificate expiration date:
kubectl get secret traefik-dashboard-cert -n monitoring -o jsonpath='{.data.tls\.crt}' | \
base64 -d | openssl x509 -noout -dates -subjectIf you get certificate errors in the browser, add the CA certificate to your system keychain.
First time installation:
sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain ../certs/ca.crtUpdating certificate (if already exists):
- Find existing certificates:
security find-certificate -a -c "MyKubernetes CA" -Z /Library/Keychains/System.keychain | grep "SHA-1 hash" | awk '{print $3}'- Remove old certificates by SHA-1 hash:
sudo security delete-certificate -Z <SHA-1_HASH> /Library/Keychains/System.keychainRepeat for each old certificate hash found in step 1.
- Add the new certificate:
sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain ../certs/ca.crt- Verify the certificate was added:
security find-certificate -c "MyKubernetes CA" /Library/Keychains/System.keychain -Z | grep "SHA-1 hash"The SHA-1 hash should match your new certificate:
openssl x509 -in ../certs/ca.crt -noout -fingerprint -sha1 | cut -d= -f2 | tr ':' ' ' | tr -d ' '- Clear browser HSTS cache (Chrome):
- Open
chrome://net-internals/#hsts - In "Delete domain security policies", enter:
traefik.mykubernetes.com - Click "Delete"
- Or clear all browser cache and restart the browser
Note: After generating a new certificate, you may need to clear your browser cache or use incognito mode to avoid HSTS (HTTP Strict Transport Security) issues.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts -n monitoring
helm repo update
helm get values my-kube-prometheus-stack -n monitoring -o yaml > ~tadeu/home-lab/monitoring/values.yaml
helm upgrade my-kube-prometheus-stack prometheus-community/kube-prometheus-stack -n monitoring --version 72.3.0 -n monitoring -f ~tadeu/home-lab/monitoring/values.yamlThis guide describes how to clean up old/obsolete metrics from Prometheus running on Kubernetes.
When metrics are renamed or discontinued, they may continue to appear in the Prometheus UI even though they are no longer being collected. This happens because Prometheus maintains a history of these metrics in its storage.
To completely clean up old metrics, we need to recreate Prometheus with a clean storage.
- Access to the Kubernetes cluster
- Helm installed
- Prometheus values file (
~/home-lab/monitoring/values.yaml)
- Delete the Prometheus StatefulSet:
kubectl delete statefulset prometheus-my-kube-prometheus-stack-prometheus -n monitoring- Delete the PVC to clean the storage:
kubectl delete pvc prometheus-my-kube-prometheus-stack-prometheus-db-prometheus-my-kube-prometheus-stack-prometheus-0 -n monitoring- Upgrade the helm chart:
helm upgrade my-kube-prometheus-stack prometheus-community/kube-prometheus-stack \
-n monitoring \
--version 72.3.0 \
-n monitoring \
-f ~/home-lab/monitoring/values.yaml- Verify that Prometheus is running:
kubectl get pods -n monitoring | grep prometheus-0After these steps, access the Prometheus UI and verify that:
- The old metrics no longer appear
- Current metrics are being collected correctly
- Prometheus is functioning normally
- This process will erase all metric history
- Prometheus will start collecting metrics from scratch
- Wait a few minutes after the process for Prometheus to fully initialize