Skip to content

Latest commit

 

History

History
273 lines (190 loc) · 7.95 KB

File metadata and controls

273 lines (190 loc) · 7.95 KB

Monitoring Stack Installation

This directory contains the configuration for deploying the Prometheus monitoring stack (kube-prometheus-stack) on Kubernetes.

Prerequisites

  • Access to the Kubernetes cluster
  • Helm installed
  • kubectl configured to access the cluster
  • Traefik ingress controller configured (for IngressRoute)

Installation Steps

1. Create the monitoring namespace

kubectl create namespace monitoring

2. Add the Prometheus Community Helm repository

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

3. Install kube-prometheus-stack

helm install my-kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --version 72.3.0 \
  --namespace monitoring \
  --values values.yaml

4. Wait for the stack to be ready

kubectl wait --for=condition=ready pod \
  -l app.kubernetes.io/name=prometheus-operator \
  -n monitoring \
  --timeout=300s

kubectl wait --for=condition=ready pod \
  -l app=prometheus,prometheus=my-kube-prometheus-stack-prometheus \
  -n monitoring \
  --timeout=300s

5. Apply Prometheus patch

Apply the patch to configure Prometheus alertmanager integration:

kubectl patch prometheus prometheus-stack-kube-prom-prometheus -n monitoring --type='merge' \
  -p='{"spec":{"alerting":{"alertmanagers":[{"apiVersion":"v2","namespace":"monitoring","name":"prometheus-stack-kube-prom-alertmanager","port":"http-web","pathPrefix":"/alertmanager"}]}}}'

Note: Adjust the Prometheus resource name according to your installation. If using prometheus-patch.yaml, verify the resource name matches your actual Prometheus instance.

6. Configure SSL/TLS Certificate

Ensure the traefik-dashboard-cert secret exists in the monitoring namespace (see SSL/TLS Certificate section).

7. Apply IngressRoute configuration

Apply the IngressRoute to expose Prometheus, Grafana, and Alertmanager through Traefik:

kubectl apply -f ingressroute.yaml

Note: The ingressroute.yaml uses service names that match your installation:

  • prometheus-stack-kube-prom-prometheus
  • prometheus-stack-grafana
  • prometheus-stack-kube-prom-alertmanager

8. Get Grafana admin password

kubectl get secret prometheus-stack-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 -d
echo

9. Verify installation

Check that all pods are running:

kubectl get pods -n monitoring

Access the services:

Configuration

Storage

The stack uses local-path storage class (default in k3s):

  • Prometheus: 2Gi storage
  • Alertmanager: 1Gi storage

Resources

Resource limits and requests are configured in values.yaml:

  • Prometheus: 1Gi memory, 1 CPU
  • Alertmanager: 512Mi memory, 500m CPU
  • Grafana: 512Mi memory, 500m CPU

SSL/TLS Certificate

The IngressRoute uses the traefik-dashboard-cert secret for TLS encryption. If the certificate is missing or expired, follow these steps:

1. Check if certificate exists in monitoring namespace

kubectl get secret traefik-dashboard-cert -n monitoring

2. Generate new certificate (if needed)

If the certificate is missing or expired, generate a new one:

cd ../certs
./certificate.sh

This will create the traefik-dashboard-cert secret in the traefik namespace.

3. Copy certificate to monitoring namespace

The IngressRoute needs the certificate in the monitoring namespace:

kubectl get secret traefik-dashboard-cert -n traefik -o yaml | \
  sed 's/namespace: traefik/namespace: monitoring/' | \
  sed '/resourceVersion:/d' | \
  sed '/uid:/d' | \
  kubectl apply -f -

4. Verify certificate validity

Check certificate expiration date:

kubectl get secret traefik-dashboard-cert -n monitoring -o jsonpath='{.data.tls\.crt}' | \
  base64 -d | openssl x509 -noout -dates -subject

5. Add CA to system keychain (macOS)

If you get certificate errors in the browser, add the CA certificate to your system keychain.

First time installation:

sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain ../certs/ca.crt

Updating certificate (if already exists):

  1. Find existing certificates:
security find-certificate -a -c "MyKubernetes CA" -Z /Library/Keychains/System.keychain | grep "SHA-1 hash" | awk '{print $3}'
  1. Remove old certificates by SHA-1 hash:
sudo security delete-certificate -Z <SHA-1_HASH> /Library/Keychains/System.keychain

Repeat for each old certificate hash found in step 1.

  1. Add the new certificate:
sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain ../certs/ca.crt
  1. Verify the certificate was added:
security find-certificate -c "MyKubernetes CA" /Library/Keychains/System.keychain -Z | grep "SHA-1 hash"

The SHA-1 hash should match your new certificate:

openssl x509 -in ../certs/ca.crt -noout -fingerprint -sha1 | cut -d= -f2 | tr ':' ' ' | tr -d ' '
  1. Clear browser HSTS cache (Chrome):
  • Open chrome://net-internals/#hsts
  • In "Delete domain security policies", enter: traefik.mykubernetes.com
  • Click "Delete"
  • Or clear all browser cache and restart the browser

Note: After generating a new certificate, you may need to clear your browser cache or use incognito mode to avoid HSTS (HTTP Strict Transport Security) issues.

UPDATE

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts -n monitoring
helm repo update
helm get values my-kube-prometheus-stack -n monitoring -o yaml > ~tadeu/home-lab/monitoring/values.yaml
helm upgrade my-kube-prometheus-stack prometheus-community/kube-prometheus-stack -n monitoring --version 72.3.0 -n monitoring -f ~tadeu/home-lab/monitoring/values.yaml

Kube-Prometheus

Cleaning up stale Prometheus metrics

This guide describes how to clean up old/obsolete metrics from Prometheus running on Kubernetes.

Problem

When metrics are renamed or discontinued, they may continue to appear in the Prometheus UI even though they are no longer being collected. This happens because Prometheus maintains a history of these metrics in its storage.

Solution

To completely clean up old metrics, we need to recreate Prometheus with a clean storage.

Prerequisites

  • Access to the Kubernetes cluster
  • Helm installed
  • Prometheus values file (~/home-lab/monitoring/values.yaml)

Steps

  1. Delete the Prometheus StatefulSet:
kubectl delete statefulset prometheus-my-kube-prometheus-stack-prometheus -n monitoring
  1. Delete the PVC to clean the storage:
kubectl delete pvc prometheus-my-kube-prometheus-stack-prometheus-db-prometheus-my-kube-prometheus-stack-prometheus-0 -n monitoring
  1. Upgrade the helm chart:
helm upgrade my-kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  -n monitoring \
  --version 72.3.0 \
  -n monitoring \
  -f ~/home-lab/monitoring/values.yaml
  1. Verify that Prometheus is running:
kubectl get pods -n monitoring | grep prometheus-0

Verification

After these steps, access the Prometheus UI and verify that:

  • The old metrics no longer appear
  • Current metrics are being collected correctly
  • Prometheus is functioning normally

Notes

  • This process will erase all metric history
  • Prometheus will start collecting metrics from scratch
  • Wait a few minutes after the process for Prometheus to fully initialize

References