terraform-k8s-monitoring

Terraform modules for deploying a full observability stack on Kubernetes — by Digitalis.io

Metrics (Mimir), logs (Loki), traces (Tempo), collection (OpenTelemetry Collector), dashboards and alerts (Grafana via kube-prometheus-stack). Works on any Kubernetes cluster — EKS, GKE, AKS, or bare metal. Metrics (Mimir), logs (Loki), traces (Tempo), collection (OpenTelemetry Collector), dashboards and alerts (Grafana via kube-prometheus-stack). Works on any Kubernetes cluster — EKS, GKE, AKS, or bare metal.

Prerequisites

A running Kubernetes cluster with a valid kubeconfig
kubectl configured and pointing at the target cluster
Terraform >= 1.4 or OpenTofu >= 1.4
The Terraform Helm and Kubernetes providers configured (see Quick Start)
Buckets or containers pre-created if using cloud storage backends — this module does not create them

Quick Start

This example deploys the full stack with local disk storage. No cloud credentials required. Data lives on the pod filesystem — suitable for development, evaluation, and blog-post walkthroughs.

providers.tf

terraform {
  required_providers {
    helm = {
      source  = "hashicorp/helm"
      version = ">= 2.12"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = ">= 2.27"
    }
  }
}

provider "helm" {
  kubernetes {
    config_path = "~/.kube/config"
  }
}

provider "kubernetes" {
  config_path = "~/.kube/config"
}

main.tf

module "cert_manager" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/cert-manager"
}

module "mimir" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/mimir"
}

module "loki" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/loki"
  loki = {
    create_namespace = false
  }
}

module "tempo" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/tempo"
  tempo = {
    create_namespace = false
  }
}

module "prometheus" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/prometheus"
  prometheus = {
    create_namespace       = false
    mimir_remote_write_url = module.mimir.remote_write_endpoint
    mimir_datasource_url   = module.mimir.query_frontend_endpoint
    mimir_tenant_id        = module.mimir.tenant_id
    loki_datasource_url    = module.loki.datasource_url
    tempo_datasource_url   = module.tempo.datasource_url
    grafana_ingress = {
      enabled    = true
      host       = "grafana.YOUR_DOMAIN"
      class_name = "nginx"
    }
  }
}

module "otel" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/otel-collector"
  otel = {
    create_namespace = false
    tempo_endpoint   = module.tempo.otlp_grpc_endpoint
    mimir_endpoint   = module.mimir.remote_write_endpoint
    loki_endpoint    = module.loki.datasource_url
  }
}

module "prometheus_rules" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/prometheus-rules"
  prometheus_rules = {
    prometheus_release_id = module.prometheus.helm_release_id
  }
}

module "grafana_rules" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/grafana-rules"
  grafana_rules = {}
}

Deploy:

terraform init
terraform apply

Grafana will be available at https://grafana.YOUR_DOMAIN. The default credentials are admin / prom-operator.

Module Reference

cert-manager

Installs cert-manager and creates a self-signed ClusterIssuer. Other modules reference this issuer in their ingress TLS annotations.

module "cert_manager" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/cert-manager"

  cert_manager = {
    chart_version       = "v1.19.1"
    namespace           = "cert-manager"
    create_namespace    = true
    cluster_issuer_name = "selfsigned-cluster-issuer"
  }
}

Variable	Default	Description
`chart_version`	`"v1.19.1"`	cert-manager Helm chart version
`namespace`	`"cert-manager"`	Namespace to deploy into
`create_namespace`	`true`	Create the namespace if it does not exist
`cluster_issuer_name`	`"selfsigned-cluster-issuer"`	Name of the ClusterIssuer to create — must match the `cert-manager.io/cluster-issuer` annotation in other modules
`kubeconfig_path`	`""`	Path to the kubeconfig file used by the `kubectl` local-exec provisioner. When empty, `--kubeconfig` is omitted and kubectl uses its standard resolution order (`KUBECONFIG` env var → `~/.kube/config`). Set explicitly to pin to a specific file (see Troubleshooting)

No notable outputs.

mimir

Installs Grafana Mimir as the metrics storage and query backend. Prometheus writes metrics here via remote_write. Grafana queries here via a Prometheus-compatible datasource.

module "mimir" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/mimir"

  mimir = {
    namespace        = "monitoring"
    retention_period = "30d"
    tenant_id        = "anonymous"
  }
}

Variable	Default	Description
`chart_version`	`"5.6.0"`	Mimir distributed Helm chart version
`namespace`	`"monitoring"`	Namespace to deploy into
`retention_period`	`"30d"`	How long to keep metrics
`tenant_id`	`"anonymous"`	Value sent in `X-Scope-OrgID` header by Prometheus and Grafana
`replicas`	`1`	Number of replicas for each Mimir component
`ingress_enabled`	`false`	Expose Mimir via an Ingress
`ingress_host`	`""`	Hostname for the Mimir ingress (required when `ingress_enabled = true`)
`ingress_class_name`	`"nginx"`	Ingress class
`ingress_tls_secret`	`""`	TLS secret name
`storage.backend`	`"local"`	Storage backend: `local`, `s3`, `gcs`, or `azure`
`storage.s3_blocks_prefix`	`""`	Object key prefix for blocks — allows sharing one S3 bucket across all three Mimir storage types
`storage.s3_ruler_prefix`	`""`	Object key prefix for ruler data
`storage.s3_alertmanager_prefix`	`""`	Object key prefix for Alertmanager data
`storage.s3_credentials_secret`	`null`	Reference a pre-existing Kubernetes Secret for S3 credentials (see S3 credentials secret)
`service_account_annotations`	`{}`	Annotations for IRSA / Workload Identity
`resources`	see below	CPU/memory requests and limits

Default resources: 100m CPU / 512Mi memory request, 2 CPU / 4Gi memory limit.

Outputs:

Output	Description
`remote_write_endpoint`	Prometheus `remote_write` URL — wire into the prometheus module
`query_frontend_endpoint`	Grafana datasource URL — wire into the prometheus module
`tenant_id`	The configured tenant ID — wire into the prometheus module
`namespace`	Namespace where Mimir is deployed

prometheus

Installs kube-prometheus-stack: Prometheus, Grafana, and Alertmanager. Configures remote_write to Mimir and adds Loki and Tempo as Grafana datasources automatically when their URLs are supplied.

module "prometheus" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/prometheus"

  prometheus = {
    create_namespace       = false
    mimir_remote_write_url = module.mimir.remote_write_endpoint
    mimir_datasource_url   = module.mimir.query_frontend_endpoint
    mimir_tenant_id        = module.mimir.tenant_id
    loki_datasource_url    = module.loki.datasource_url
    tempo_datasource_url   = module.tempo.datasource_url
  }
}

Variable	Default	Description
`chart_version`	`"86.3.2"`	kube-prometheus-stack Helm chart version
`namespace`	`"monitoring"`	Namespace to deploy into
`create_namespace`	`true`	Create the namespace if it does not exist
`namespace_labels`	`{}`	Additional labels to apply to the namespace
`namespace_annotations`	`{}`	Additional annotations to apply to the namespace
`grafana_enabled`	`true`	Deploy Grafana
`alertmanager_enabled`	`true`	Deploy Alertmanager
`mimir_remote_write_url`	`""`	Mimir remote_write URL — use `module.mimir.remote_write_endpoint`
`mimir_datasource_url`	`""`	Mimir query URL — use `module.mimir.query_frontend_endpoint`
`mimir_tenant_id`	`"anonymous"`	Tenant ID for `X-Scope-OrgID` header
`loki_datasource_url`	`""`	Loki URL — use `module.loki.datasource_url`
`tempo_datasource_url`	`""`	Tempo URL — use `module.tempo.datasource_url`
`pyroscope_datasource_url`	`""`	Pyroscope URL — use `module.pyroscope.datasource_url`
`clickhouse_datasource`	`null`	ClickHouse datasource config — see ClickHouse integration
`storage_size`	`"20Gi"`	PVC size for Prometheus TSDB
`storage_class`	`""`	StorageClass name (cluster default if empty)
`retention`	`"24h"`	Local TSDB retention (metrics are in Mimir long-term)
`grafana_dashboard_imports`	Node Exporter Full (1860)	Grafana.com dashboard IDs to import
`extra_dashboards`	`{}`	Additional dashboard JSON — `{ "name.json" = file("...") }`
`grafana_plugins`	see below	Grafana plugins to install
`grafana_ingress`	disabled	Grafana ingress config (see Enable ingress)
`prometheus_ingress`	disabled	Prometheus ingress config
`alertmanager_ingress`	disabled	Alertmanager ingress config
`resources`	see below	CPU/memory requests and limits

Default resources: 200m CPU / 512Mi memory request, 2 CPU / 2Gi memory limit.

Default Grafana plugins: digrich-bubblechart-panel, grafana-clock-panel, btplc-status-dot-panel, grafana-piechart-panel, grafana-llm-app, grafana-clickhouse-datasource.

Outputs:

Output	Description
`grafana_service`	In-cluster Grafana URL
`helm_release_id`	Helm release ID — required by prometheus-rules module
`namespace`	Namespace where kube-prometheus-stack is deployed

loki

Installs Grafana Loki for log aggregation. Supports single-binary (default) and scalable (SimpleScalable) deployment modes.

module "loki" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/loki"

  loki = {
    namespace        = "monitoring"
    create_namespace = false
    deployment_mode  = "single-binary"
    retention_period = "744h"
  }
}

Variable	Default	Description
`chart_version`	`"6.6.0"`	Loki Helm chart version
`namespace`	`"monitoring"`	Namespace to deploy into
`create_namespace`	`true`	Create the namespace if it does not exist
`deployment_mode`	`"single-binary"`	`single-binary` or `scalable`
`replicas`	`1`	Replica count (single-binary mode)
`retention_period`	`"744h"`	Log retention period (31 days)
`storage.backend`	`"local"`	Storage backend: `local`, `s3`, `gcs`, or `azure`
`storage.s3_credentials_secret`	`null`	Reference a pre-existing Kubernetes Secret for S3 credentials (see S3 credentials secret)
`service_account_annotations`	`{}`	Annotations for IRSA / Workload Identity
`resources`	see below	CPU/memory requests and limits

Default resources: 100m CPU / 256Mi memory request, 2 CPU / 2Gi memory limit.

Outputs:

Output	Description
`datasource_url`	Loki URL for Grafana datasource and OTel Collector — `http://loki.monitoring.svc.cluster.local:3100`
`namespace`	Namespace where Loki is deployed

tempo

Installs Grafana Tempo for distributed tracing. Supports monolithic (default) and distributed deployment modes.

module "tempo" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/tempo"

  tempo = {
    namespace        = "monitoring"
    create_namespace = false
    deployment_mode  = "monolithic"
    retention        = "720h"
  }
}

Variable	Default	Description
`chart_version`	`"1.40.0"`	Tempo Helm chart version
`namespace`	`"monitoring"`	Namespace to deploy into
`create_namespace`	`true`	Create the namespace if it does not exist
`namespace_labels`	`{}`	Additional labels to apply to the namespace
`namespace_annotations`	`{}`	Additional annotations to apply to the namespace
`deployment_mode`	`"monolithic"`	`monolithic` or `distributed`
`replicas`	`1`	Replica count (monolithic mode)
`retention`	`"720h"`	Trace retention period (30 days)
`metrics_generator_remote_write_url`	`""`	Mimir (or Prometheus) remote_write URL to enable metrics-generator for TraceQL `rate()` and span metrics
`storage.backend`	`"local"`	Storage backend: `local`, `s3`, `gcs`, or `azure`
`storage.s3_credentials_secret`	`null`	Reference a pre-existing Kubernetes Secret for S3 credentials (see S3 credentials secret)
`service_account_annotations`	`{}`	Annotations for IRSA / Workload Identity
`resources`	see below	CPU/memory requests and limits

Default resources: 100m CPU / 256Mi memory request, 2 CPU / 2Gi memory limit.

Outputs:

Output	Description
`datasource_url`	Tempo URL for Grafana datasource
`otlp_grpc_endpoint`	OTLP gRPC endpoint for app instrumentation (port 4317)
`otlp_http_endpoint`	OTLP HTTP endpoint for app instrumentation (port 4318)
`namespace`	Namespace where Tempo is deployed

otel-collector

Installs the OpenTelemetry Collector (contrib image). Receives OTLP traces, metrics, and logs from your applications and forwards them to Tempo, Mimir, and Loki respectively. Optionally enables the OpenTelemetry Operator for workload instrumentation. Runs as a DaemonSet by default.

module "otel" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/otel-collector"

  otel = {
    namespace        = "monitoring"
    create_namespace = false
    mode             = "daemonset"
    tempo_endpoint   = module.tempo.otlp_grpc_endpoint
    mimir_endpoint   = module.mimir.remote_write_endpoint
    mimir_tenant_id  = module.mimir.tenant_id
    loki_endpoint    = module.loki.datasource_url
  }
}

Variable	Default	Description
`chart_version`	`"0.158.2"`	OpenTelemetry Collector Helm chart version
`namespace`	`"monitoring"`	Namespace to deploy into
`create_namespace`	`true`	Create the namespace if it does not exist
`namespace_labels`	`{}`	Additional labels to apply to the namespace
`namespace_annotations`	`{}`	Additional annotations to apply to the namespace
`mode`	`"daemonset"`	`daemonset` or `deployment`
`tempo_endpoint`	`""`	OTLP gRPC endpoint for Tempo — use `module.tempo.otlp_grpc_endpoint`
`mimir_endpoint`	`""`	Remote write URL for Mimir — use `module.mimir.remote_write_endpoint`
`mimir_tenant_id`	`"anonymous"`	Tenant ID for `X-Scope-OrgID` header sent to Mimir — use `module.mimir.tenant_id`
`loki_endpoint`	`""`	Loki push URL — use `module.loki.datasource_url`
`clickhouse_endpoint`	`""`	ClickHouse HTTP endpoint (`:8123`) for logs and traces
`clickhouse_username`	`""`	ClickHouse username
`clickhouse_password`	`""`	ClickHouse password
`clickhouse_database`	`"otel"`	ClickHouse database name for OTLP/ClickHouse exporter
`clickhouse_create_schema`	`true`	Auto-create database and tables on startup. Disable on memory-constrained ClickHouse instances and pre-create the schema manually
`image.repository`	`"otel/opentelemetry-collector-contrib"`	Collector image (contrib required for Loki and Mimir exporters)
`image.tag`	`""`	Image tag (empty = chart appVersion)
`image.pull_policy`	`"IfNotPresent"`	Image pull policy
`operator.enabled`	`false`	Deploy the OpenTelemetry Operator for auto-instrumentation
`operator.chart_version`	`"0.116.0"`	Operator Helm chart version
`operator.collector_image_repository`	`"otel/opentelemetry-collector-k8s"`	Operator's default collector image repository
`operator.cert_manager_enabled`	`false`	Use cert-manager for webhook certificates
`operator.auto_generate_cert_enabled`	`true`	Auto-generate webhook certificates (incompatible with cert-manager)
`operator.extra_args`	`[]`	Additional arguments to pass to the operator
`operator.go_instrumentation_enabled`	`false`	Enable Go auto-instrumentation via eBPF (requires Linux kernel >=4.19)
`operator.go_instrumentation_image`	`""`	Go instrumentation image (defaults to chart appVersion when empty)
`service_account_annotations`	`{}`	Annotations for IRSA / Workload Identity
`resources`	see below	CPU/memory requests and limits

Default resources: 300m CPU / 256Mi memory request, 500m CPU / 512Mi memory limit.

Outputs:

Output	Description
`otlp_grpc_endpoint`	OTLP gRPC endpoint your apps send traces to (port 4317)
`otlp_http_endpoint`	OTLP HTTP endpoint your apps send traces to (port 4318)
`namespace`	Namespace where the collector is deployed
`helm_release_name`	Helm release name
`helm_release_version`	Deployed chart version

alloy

Installs Grafana Alloy — the OpenTelemetry-native successor to Grafana Agent. Receives OTLP traces, metrics, logs, and profiles from instrumented applications using a River/Alloy pipeline config, and forwards each signal to the configured backend. Runs as a DaemonSet by default (one pod per node), but supports Deployment and StatefulSet controller types.

module "alloy" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/alloy"

  alloy = {
    namespace        = "monitoring"
    create_namespace = false
    controller_type  = "daemonset"
    tempo_endpoint   = module.tempo.otlp_grpc_endpoint
    mimir_endpoint   = module.mimir.remote_write_endpoint
    mimir_tenant_id  = module.mimir.tenant_id
    loki_endpoint    = module.loki.datasource_url
  }
}

Variable	Default	Description
`chart_version`	`"0.12.5"`	Alloy Helm chart version — check ArtifactHub for the latest
`namespace`	`"monitoring"`	Namespace to deploy into
`create_namespace`	`true`	Create the namespace if it does not exist
`namespace_labels`	`{}`	Additional labels to apply to the namespace
`namespace_annotations`	`{}`	Additional annotations to apply to the namespace
`controller_type`	`"daemonset"`	Kubernetes workload kind: `daemonset`, `deployment`, or `statefulset`
`replicas`	`1`	Replica count (ignored when `controller_type = "daemonset"`)
`alloy_config`	`""`	Full River/Alloy pipeline config. When empty, a built-in default config is rendered using the non-empty sibling endpoints below
`loki_endpoint`	`""`	Loki push URL — use `module.loki.datasource_url`
`tempo_endpoint`	`""`	Tempo OTLP gRPC endpoint — use `module.tempo.otlp_grpc_endpoint`
`mimir_endpoint`	`""`	Mimir remote write URL — use `module.mimir.remote_write_endpoint`
`mimir_tenant_id`	`"anonymous"`	Value sent in `X-Scope-OrgID` header to Mimir — use `module.mimir.tenant_id`
`pyroscope_endpoint`	`""`	Pyroscope push URL — use `module.pyroscope.push_url`
`otel_grpc_endpoint`	`""`	Upstream OTel Collector endpoint for chaining — use `module.otel.otlp_grpc_endpoint`
`persistence.enabled`	`false`	Mount a PVC for WAL state (only meaningful with `controller_type = "statefulset"`)
`persistence.size`	`"10Gi"`	PVC size
`persistence.storage_class`	`""`	StorageClass name (cluster default if empty)
`ingress.enabled`	`false`	Expose Alloy via an Ingress
`ingress.host`	`""`	Ingress hostname (required when `ingress.enabled = true`)
`ingress.class_name`	`"nginx"`	Ingress class
`ingress.tls_secret`	`""`	TLS secret name
`service_account_annotations`	`{}`	Annotations for IRSA / Workload Identity
`resources`	see below	CPU/memory requests and limits
`extra_values`	`""`	Extra Helm values merged last (highest precedence)

Default resources: 100m CPU / 128Mi memory request, 500m CPU / 512Mi memory limit.

Outputs:

Output	Description
`otlp_grpc_endpoint`	OTLP gRPC endpoint for app instrumentation — `http://alloy.<namespace>.svc.cluster.local:4317`
`otlp_http_endpoint`	OTLP HTTP endpoint for app instrumentation — `http://alloy.<namespace>.svc.cluster.local:4318`
`namespace`	Namespace where Alloy is deployed
`helm_release_name`	Helm release name
`helm_release_version`	Deployed chart version

pyroscope

Installs Grafana Pyroscope for continuous profiling. Collects CPU, memory, goroutine, and heap profiles from Go, Java, Python, Ruby, and other supported runtimes. Profiles are stored in Pyroscope and queried through a dedicated Grafana datasource.

module "pyroscope" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/pyroscope"

  pyroscope = {
    namespace        = "monitoring"
    create_namespace = false
  }
}

Variable	Default	Description
`chart_version`	`"1.20.3"`	Pyroscope Helm chart version
`namespace`	`"monitoring"`	Namespace to deploy into
`create_namespace`	`true`	Create the namespace if it does not exist
`namespace_labels`	`{}`	Additional labels to apply to the namespace
`namespace_annotations`	`{}`	Additional annotations to apply to the namespace
`replicas`	`1`	Number of Pyroscope replicas
`storage.backend`	`"local"`	Storage backend: `local`, `s3`, `gcs`, or `azure`
`storage.s3_bucket`	`""`	S3 bucket name
`storage.s3_region`	`""`	S3 region
`storage.s3_endpoint`	`""`	S3-compatible endpoint hostname (scheme stripped automatically)
`storage.s3_insecure`	`false`	Use plain HTTP for the S3 endpoint
`storage.s3_access_key`	`""`	S3 access key (leave empty for IRSA)
`storage.s3_secret_key`	`""`	S3 secret key (leave empty for IRSA)
`storage.s3_credentials_secret`	`null`	Reference a pre-existing Kubernetes Secret for S3 credentials (see S3 credentials secret)
`storage.gcs_bucket`	`""`	GCS bucket name
`storage.gcs_service_account_key`	`""`	GCS service account JSON key (leave empty for Workload Identity)
`storage.azure_storage_account`	`""`	Azure storage account name
`storage.azure_container`	`""`	Azure blob container name
`storage.azure_storage_account_key`	`""`	Azure storage account key
`service_account_annotations`	`{}`	Annotations for IRSA / Workload Identity
`resources`	see below	CPU/memory requests and limits

Default resources: 100m CPU / 256Mi memory request, 1 CPU / 1Gi memory limit.

S3 path-style not supported. Pyroscope's S3 client does not support bucket_lookup_type (path-style access). When using an S3-compatible service such as Hetzner Object Storage, Exoscale, or Cloudflare R2, use a bucket-specific endpoint instead of a shared endpoint with s3_path_style = true:
storage = {
  backend      = "s3"
  s3_bucket    = "mybucket"
  s3_region    = "ch-gva-2"
  s3_endpoint  = "mybucket.sos-ch-gva-2.exo.io"  # bucket-specific endpoint
  s3_access_key = "YOUR_ACCESS_KEY"
  s3_secret_key = "YOUR_SECRET_KEY"
}

Outputs:

Output	Description
`datasource_url`	Pyroscope URL for Grafana datasource — wire into the prometheus module as `pyroscope_datasource_url`
`push_url`	Pyroscope push URL for profiling agents — `http://pyroscope.<namespace>.svc.cluster.local:4040`
`namespace`	Namespace where Pyroscope is deployed
`helm_release_name`	Helm release name
`helm_release_version`	Deployed chart version

prometheus-rules

Applies Prometheus alert rules and configures Alertmanager receivers. Must be applied after the prometheus module — pass module.prometheus.helm_release_id to enforce ordering.

module "prometheus_rules" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/prometheus-rules"

  prometheus_rules = {
    namespace             = "monitoring"
    prometheus_release_id = module.prometheus.helm_release_id
  }
}

Variable	Default	Description
`namespace`	`"monitoring"`	Must match the kube-prometheus-stack namespace
`prometheus_release_id`	required	Output from `module.prometheus.helm_release_id`
`kubeconfig_path`	`""`	Path to the kubeconfig file used by `kubectl` local-exec. When empty, `--kubeconfig` is omitted and kubectl uses its standard resolution order (`KUBECONFIG` env var → `~/.kube/config`). Set explicitly to pin to a specific file (see Troubleshooting)
`extra_rules`	`{}`	Additional rule YAML files — `{ "my-app.yaml" = file("...") }`
`slack.enabled`	`false`	Send alerts to Slack
`slack.webhook_url`	`""`	Slack incoming webhook URL (required when enabled)
`slack.channel`	`"#alerts"`	Slack channel
`slack.min_severity`	`"warning"`	Minimum severity to forward: `info`, `warning`, or `critical`
`pagerduty.enabled`	`false`	Send alerts to PagerDuty
`pagerduty.routing_key`	`""`	PagerDuty routing key (required when enabled)
`pagerduty.min_severity`	`"critical"`	Minimum severity to page

No notable outputs.

grafana-rules

Applies Grafana-managed alert rules and configures Grafana contact points (Slack, PagerDuty, webhook, email).

module "grafana_rules" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/grafana-rules"

  grafana_rules = {
    namespace = "monitoring"
  }
}

Variable	Default	Description
`namespace`	`"monitoring"`	Must match the kube-prometheus-stack namespace
`extra_rules`	`{}`	Additional rule YAML files — `{ "my-app.yaml" = file("...") }`
`slack.enabled`	`false`	Send alerts to Slack
`slack.webhook_url`	`""`	Slack incoming webhook URL (required when enabled)
`slack.channel`	`"#alerts"`	Slack channel
`slack.min_severity`	`"warning"`	Minimum severity: `info`, `warning`, or `critical`
`pagerduty.enabled`	`false`	Send alerts to PagerDuty
`pagerduty.integration_key`	`""`	PagerDuty integration key (required when enabled)
`pagerduty.min_severity`	`"critical"`	Minimum severity to page
`webhook.enabled`	`false`	Send alerts to a generic webhook
`webhook.url`	`""`	Webhook URL (required when enabled)
`webhook.http_method`	`"POST"`	HTTP method
`webhook.min_severity`	`"warning"`	Minimum severity
`email.enabled`	`false`	Send alerts by email
`email.addresses`	`[]`	List of recipient email addresses (required when enabled)
`email.min_severity`	`"critical"`	Minimum severity

No notable outputs.

Common Recipes

Complete, copy-paste examples are available in the examples/ directory:

Example	Description
`examples/minimal/`	Full stack with local disk storage — no cloud credentials needed
`examples/alloy-basic/`	Alloy DaemonSet collector wired to Loki, Tempo, and Mimir
`examples/aws/`	S3 backend with IRSA authentication on EKS
`examples/gcp/`	GCS backend with Workload Identity on GKE

Use S3 for Mimir storage (with IRSA)

Pre-create three S3 buckets before running terraform apply. IRSA handles authentication — no access keys needed.

module "mimir" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/mimir"

  mimir = {
    namespace        = "monitoring"
    retention_period = "90d"

    storage = {
      backend                = "s3"
      s3_blocks_bucket       = "YOUR_BUCKET_NAME-mimir-blocks"
      s3_ruler_bucket        = "YOUR_BUCKET_NAME-mimir-ruler"
      s3_alertmanager_bucket = "YOUR_BUCKET_NAME-mimir-alertmanager"
      s3_region              = "eu-west-1"
      # s3_access_key and s3_secret_key left empty — IRSA is used instead
    }

    service_account_annotations = {
      "eks.amazonaws.com/role-arn" = "arn:aws:iam::123456789012:role/mimir"
    }
  }
}

Use S3-compatible storage (Hetzner, MinIO, Ceph)

Any S3-compatible service works. Set s3_endpoint to the service hostname or URL, s3_path_style = true (required by Hetzner and most non-AWS services), and provide access credentials.

The module strips https:// and http:// from s3_endpoint automatically, so both "https://fsn1.your-objectstorage.com" and "fsn1.your-objectstorage.com" are accepted.

module "mimir" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/mimir"

  mimir = {
    namespace        = "monitoring"
    retention_period = "30d"

    storage = {
      backend                = "s3"
      s3_blocks_bucket       = "mimir-blocks"
      s3_ruler_bucket        = "mimir-ruler"
      s3_alertmanager_bucket = "mimir-alertmanager"
      s3_region              = "eu-central"          # Hetzner region, or "us-east-1" for MinIO
      s3_endpoint            = "fsn1.your-objectstorage.com"  # Hetzner example — scheme optional
      s3_path_style          = true                  # required for Hetzner, MinIO, Ceph
      s3_insecure            = false                 # set true only for plain HTTP endpoints
      s3_access_key          = "YOUR_ACCESS_KEY"
      s3_secret_key          = "YOUR_SECRET_KEY"
    }
  }
}

The same s3_endpoint, s3_path_style, and s3_insecure variables are available on modules/loki and modules/tempo:

module "loki" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/loki"

  loki = {
    storage = {
      backend          = "s3"
      s3_chunks_bucket = "loki-chunks"
      s3_ruler_bucket  = "loki-ruler"
      s3_region        = "eu-central"
      s3_endpoint      = "fsn1.your-objectstorage.com"  # scheme optional
      s3_path_style    = true
      s3_access_key    = "YOUR_ACCESS_KEY"
      s3_secret_key    = "YOUR_SECRET_KEY"
    }
  }
}

Use GCS for Loki storage (with Workload Identity)

Pre-create two GCS buckets before running terraform apply.

module "loki" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/loki"

  loki = {
    namespace        = "monitoring"
    create_namespace = false
    retention_period = "744h"

    storage = {
      backend           = "gcs"
      gcs_chunks_bucket = "YOUR_PROJECT-loki-chunks"
      gcs_ruler_bucket  = "YOUR_PROJECT-loki-ruler"
      # gcs_service_account_key left empty — Workload Identity is used instead
    }

    service_account_annotations = {
      "iam.gke.io/gcp-service-account" = "loki@YOUR_GCP_PROJECT.iam.gserviceaccount.com"
    }
  }
}

Add a custom Grafana dashboard from a JSON file

Place your dashboard JSON anywhere in the repo, then pass it via extra_dashboards. The key is the filename that appears in Grafana.

module "prometheus" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/prometheus"

  prometheus = {
    create_namespace       = false
    mimir_remote_write_url = module.mimir.remote_write_endpoint
    mimir_datasource_url   = module.mimir.query_frontend_endpoint
    mimir_tenant_id        = module.mimir.tenant_id

    extra_dashboards = {
      "my-app.json"      = file("${path.module}/dashboards/my-app.json")
      "another-app.json" = file("${path.module}/dashboards/another-app.json")
    }
  }
}

Add a custom Grafana dashboard by grafana.com ID

Find the dashboard on grafana.com/grafana/dashboards, note its ID and revision number.

module "prometheus" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/prometheus"

  prometheus = {
    create_namespace       = false
    mimir_remote_write_url = module.mimir.remote_write_endpoint
    mimir_datasource_url   = module.mimir.query_frontend_endpoint
    mimir_tenant_id        = module.mimir.tenant_id

    grafana_dashboard_imports = [
      # Node Exporter Full — already included by default, shown here as example
      { gnet_id = 1860, revision = 37, datasource = "Mimir" },
      # Kubernetes / Compute Resources / Cluster
      { gnet_id = 15520, revision = 9, datasource = "Mimir" },
      # Loki dashboard
      { gnet_id = 13639, revision = 2, datasource = "Loki" },
    ]
  }
}

Add custom Prometheus alert rules from a YAML file

Write a standard PrometheusRule-compatible YAML file and pass it via extra_rules.

rules/my-app.yaml:

groups:
  - name: my-app
    rules:
      - alert: MyAppHighErrorRate
        expr: rate(http_requests_total{job="my-app",status=~"5.."}[5m]) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate on my-app"
          description: "Error rate is {{ $value | humanizePercentage }} over the last 5 minutes."

module "prometheus_rules" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/prometheus-rules"

  prometheus_rules = {
    namespace             = "monitoring"
    prometheus_release_id = module.prometheus.helm_release_id

    extra_rules = {
      "my-app.yaml" = file("${path.module}/rules/my-app.yaml")
    }
  }
}

Enable Slack alerts (prometheus-rules)

Alerts at warning severity or above are forwarded to Slack. Critical alerts also go to Slack unless you raise min_severity.

module "prometheus_rules" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/prometheus-rules"

  prometheus_rules = {
    namespace             = "monitoring"
    prometheus_release_id = module.prometheus.helm_release_id

    slack = {
      enabled      = true
      webhook_url  = "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
      channel      = "#platform-alerts"
      min_severity = "warning"
    }
  }
}

Enable PagerDuty alerts (grafana-rules)

Only critical alerts page by default. Lower min_severity to warning to increase coverage.

module "grafana_rules" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/grafana-rules"

  grafana_rules = {
    namespace = "monitoring"

    pagerduty = {
      enabled         = true
      integration_key = "YOUR_PAGERDUTY_INTEGRATION_KEY"
      min_severity    = "critical"
    }
  }
}

Enable continuous profiling (Pyroscope)

Deploy Pyroscope and wire it into Grafana as a datasource. The pyroscope_datasource_url variable adds a grafana-pyroscope-datasource datasource with uid pyroscope to Grafana automatically.

module "pyroscope" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/pyroscope"

  pyroscope = {
    namespace        = "monitoring"
    create_namespace = false
  }
}

module "prometheus" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/prometheus"

  prometheus = {
    create_namespace       = false
    mimir_remote_write_url = module.mimir.remote_write_endpoint
    mimir_datasource_url   = module.mimir.query_frontend_endpoint
    mimir_tenant_id        = module.mimir.tenant_id
    loki_datasource_url    = module.loki.datasource_url
    tempo_datasource_url   = module.tempo.datasource_url
    pyroscope_datasource_url = module.pyroscope.datasource_url
  }
}

Once deployed, push profiles from your applications to http://pyroscope.monitoring.svc.cluster.local:4040. Pyroscope uses port 4040 for both push ingestion and query.

Enable Tempo metrics generator with Mimir backend

Tempo's metrics-generator extracts RED metrics (Request, Error, Duration) and custom span metrics from traces, then writes them to Mimir for long-term storage. This enables TraceQL rate() queries and correlation between traces and metrics.

module "tempo" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/tempo"

  tempo = {
    namespace        = "monitoring"
    create_namespace = false
    # Enable metrics generation — write to the same Mimir endpoint as Prometheus
    metrics_generator_remote_write_url = module.mimir.remote_write_endpoint
  }
}

module "prometheus" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/prometheus"

  prometheus = {
    create_namespace         = false
    mimir_remote_write_url   = module.mimir.remote_write_endpoint
    mimir_datasource_url     = module.mimir.query_frontend_endpoint
    mimir_tenant_id          = module.mimir.tenant_id
    tempo_datasource_url     = module.tempo.datasource_url
  }
}

ClickHouse integration for logs and traces

Use ClickHouse as an alternative backend for OTLP logs and traces. The OTel Collector exports directly to ClickHouse, and Grafana queries via the ClickHouse datasource plugin.

Deploy ClickHouse first (or use a managed instance), then wire the OTel Collector and Grafana datasource:

module "otel" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/otel-collector"

  otel = {
    namespace           = "monitoring"
    create_namespace    = false
    tempo_endpoint      = module.tempo.otlp_grpc_endpoint
    mimir_endpoint      = module.mimir.remote_write_endpoint
    loki_endpoint       = module.loki.datasource_url
    # ClickHouse exporter configuration
    clickhouse_endpoint  = "clickhouse.observability.svc.cluster.local:8123"
    clickhouse_username  = "default"
    clickhouse_password  = "your-password"
    clickhouse_database  = "otel"
    clickhouse_create_schema = true  # auto-create tables on startup
  }
}

module "prometheus" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/prometheus"

  prometheus = {
    create_namespace       = false
    mimir_remote_write_url = module.mimir.remote_write_endpoint
    mimir_datasource_url   = module.mimir.query_frontend_endpoint
    mimir_tenant_id        = module.mimir.tenant_id

    # ClickHouse datasource for querying OTel logs/traces
    clickhouse_datasource = {
      host     = "clickhouse.observability.svc.cluster.local"
      port     = 9000
      database = "otel"
      username = "default"
      password = "your-password"
      secure   = false
      # OTel schema — matches tables created by the otel-collector ClickHouse exporter
      logs_otel_enabled    = true
      logs_default_table   = "otel_logs"
      traces_otel_enabled  = true
      traces_default_table = "otel_traces"
    }
  }
}

Enable OpenTelemetry Operator for auto-instrumentation

The OpenTelemetry Operator enables zero-code instrumentation of workloads via annotations. Deployed workloads are automatically patched with OTEL_JAVAAGENT, Go eBPF instrumentation, or Python auto-instrumentation.

module "otel" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/otel-collector"

  otel = {
    namespace        = "monitoring"
    create_namespace = false

    operator = {
      enabled           = true
      chart_version     = "0.116.0"
      cert_manager_enabled = false  # auto-generate webhook certs by default
      # Enable Go eBPF instrumentation (requires Linux kernel >=4.19)
      go_instrumentation_enabled = true
    }
  }
}

After deployment, annotate your workload to enable instrumentation:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    metadata:
      annotations:
        instrumentation.opentelemetry.io/inject-java: "true"  # or inject-python, inject-go
    spec:
      containers:
      - name: app
        image: my-app:latest

Enable ingress for Grafana with TLS via cert-manager

This requires the cert-manager module to be deployed first. The cluster_issuer_name in cert-manager must match the cert-manager.io/cluster-issuer annotation below.

module "cert_manager" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/cert-manager"

  cert_manager = {
    cluster_issuer_name = "selfsigned-cluster-issuer"
  }
}

module "prometheus" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/prometheus"

  prometheus = {
    create_namespace       = false
    mimir_remote_write_url = module.mimir.remote_write_endpoint
    mimir_datasource_url   = module.mimir.query_frontend_endpoint
    mimir_tenant_id        = module.mimir.tenant_id

    grafana_ingress = {
      enabled    = true
      host       = "grafana.YOUR_DOMAIN"
      class_name = "nginx"
      tls_secret = "grafana-tls"
      annotations = {
        "cert-manager.io/cluster-issuer" = "selfsigned-cluster-issuer"
      }
    }

    prometheus_ingress = {
      enabled    = true
      host       = "prometheus.YOUR_DOMAIN"
      class_name = "nginx"
      tls_secret = "prometheus-tls"
      annotations = {
        "cert-manager.io/cluster-issuer" = "selfsigned-cluster-issuer"
      }
    }

    alertmanager_ingress = {
      enabled    = true
      host       = "alertmanager.YOUR_DOMAIN"
      class_name = "nginx"
      tls_secret = "alertmanager-tls"
      annotations = {
        "cert-manager.io/cluster-issuer" = "selfsigned-cluster-issuer"
      }
    }
  }
}

Storage Backends

All modules default to local disk storage. For production, use an object storage backend. Buckets and containers must be created before running terraform apply — these modules do not create them.

Module	local	S3	GCS	Azure
mimir	yes	yes	yes	yes
loki	yes	yes	yes	yes
tempo	yes	yes	yes	yes
pyroscope	yes	yes	yes	yes
otel-collector	n/a	n/a	n/a	n/a
prometheus	n/a	n/a	n/a	n/a
cert-manager	n/a	n/a	n/a	n/a

S3 bucket requirements per module:

Module	Required buckets
mimir	`s3_blocks_bucket`, `s3_ruler_bucket`, `s3_alertmanager_bucket`
loki	`s3_chunks_bucket`, `s3_ruler_bucket`
tempo	`s3_bucket`
pyroscope	`s3_bucket`

GCS bucket requirements per module:

Module	Required buckets
mimir	`gcs_blocks_bucket`, `gcs_ruler_bucket`, `gcs_alertmanager_bucket`
loki	`gcs_chunks_bucket`, `gcs_ruler_bucket`
tempo	`gcs_bucket`
pyroscope	`gcs_bucket`

Azure container requirements per module:

Module	Required containers
mimir	`azure_storage_account`, `azure_blocks_container`, `azure_ruler_container`, `azure_alertmanager_container`
loki	`azure_storage_account`, `azure_chunks_container`, `azure_ruler_container`
tempo	`azure_storage_account`, `azure_container`
pyroscope	`azure_storage_account`, `azure_container`

For IRSA (AWS) or Workload Identity (GCP/Azure), leave the key fields empty and provide the IAM annotation via service_account_annotations. The module does not create IAM roles — pre-create the role and supply the annotation.

# IRSA (EKS)
service_account_annotations = {
  "eks.amazonaws.com/role-arn" = "arn:aws:iam::123456789012:role/mimir"
}

# GKE Workload Identity
service_account_annotations = {
  "iam.gke.io/gcp-service-account" = "mimir@YOUR_GCP_PROJECT.iam.gserviceaccount.com"
}

S3 credentials secret

Instead of passing s3_access_key and s3_secret_key as plain text, you can reference a pre-existing Kubernetes Secret. The module injects the credentials as environment variables rather than embedding them in Helm values.

storage = {
  backend                = "s3"
  s3_blocks_bucket       = "mimir-blocks"
  s3_ruler_bucket        = "mimir-ruler"
  s3_alertmanager_bucket = "mimir-alertmanager"
  s3_region              = "eu-west-1"

  s3_credentials_secret = {
    name             = "my-s3-secret"       # name of the pre-existing Secret
    access_key_field = "access-key"         # key inside the Secret (default: "access-key")
    secret_key_field = "secret-key"         # key inside the Secret (default: "secret-key")
  }
}

The same s3_credentials_secret variable is available on modules/loki and modules/tempo. To share one Secret across all three modules, pass the same name to each.

Three credential modes are supported — use whichever fits your environment:

Mode	How to configure
IRSA / Workload Identity	Leave `s3_access_key`, `s3_secret_key`, and `s3_credentials_secret` all unset; provide `service_account_annotations`
Plain-text keys	Set `s3_access_key` and `s3_secret_key` directly; the module creates a Secret automatically
Pre-existing Secret	Set `s3_credentials_secret`; leave `s3_access_key` and `s3_secret_key` unset

Sharing one S3 bucket across Mimir storage types (Mimir only)

By default Mimir requires three separate S3 buckets (blocks, ruler, alertmanager). If you prefer a single bucket, use the s3_blocks_prefix, s3_ruler_prefix, and s3_alertmanager_prefix variables to isolate each storage type under a distinct key prefix.

storage = {
  backend                = "s3"
  s3_blocks_bucket       = "mimir-shared"
  s3_ruler_bucket        = "mimir-shared"
  s3_alertmanager_bucket = "mimir-shared"
  s3_region              = "eu-west-1"
  s3_blocks_prefix       = "blocks"
  s3_ruler_prefix        = "ruler"
  s3_alertmanager_prefix = "alertmanager"
}

S3 endpoint format

All three modules (mimir, loki, tempo) strip https:// and http:// from s3_endpoint automatically before passing the value to the underlying Helm chart. Either format is accepted:

s3_endpoint = "fsn1.your-objectstorage.com"        # hostname only — preferred
s3_endpoint = "https://fsn1.your-objectstorage.com" # scheme stripped automatically

Architecture

                          ┌─────────────────────────────────────────┐
                          │            Kubernetes Cluster            │
                          │                                          │
  ┌──────────┐  OTLP      │  ┌─────────────────────────────────┐   │
  │   Your   │──gRPC/HTTP─┼─▶│      OpenTelemetry Collector     │   │
  │   Apps   │            │  └──────┬──────────┬───────────┬───┘   │
  └──────────┘            │         │          │           │        │
                          │   traces│    metrics│      logs│        │
                          │         ▼          ▼           ▼        │
                          │  ┌──────────┐ ┌───────┐ ┌──────────┐  │
                          │  │  Tempo   │ │ Mimir │ │   Loki   │  │
                          │  │ (traces) │ │(metrics)│ │  (logs)  │  │
                          │  └────┬─────┘ └───┬───┘ └────┬─────┘  │
                          │       │            │          │         │
                          │       └────────────┼──────────┘         │
                          │                    │ query               │
                          │                    ▼                     │
                          │           ┌─────────────────┐           │
                          │           │     Grafana      │           │
                          │           │  (dashboards +   │           │
                          │           │     alerts)      │           │
                          │           └────────┬─────────┘           │
                          └────────────────────┼─────────────────────┘
                                               │ HTTPS
                                               ▼
                                        Browser / User

  ┌──────────────────────────────────────────────────────┐
  │                    Alert routing                      │
  │                                                       │
  │  Prometheus ──▶ Alertmanager ──▶ Slack / PagerDuty  │
  │  Grafana rules ────────────────▶ Slack / PagerDuty  │
  └──────────────────────────────────────────────────────┘

  ┌──────────────────────────────────────────────────────┐
  │                   Prometheus scraping                 │
  │                                                       │
  │  Kubernetes nodes, pods, services                     │
  │         │                                             │
  │         ▼                                             │
  │    Prometheus ──remote_write──▶ Mimir                 │
  └──────────────────────────────────────────────────────┘

Component roles at a glance:

Component	Role
Mimir	Long-term metrics storage and query backend
Loki	Log aggregation and query
Tempo	Distributed trace storage and query
Prometheus	Cluster scraping and remote write to Mimir
Grafana	Unified dashboards and alert management
OTel Collector	OTLP receiver — forwards traces to Tempo, metrics to Mimir, logs to Loki
Alloy	OTel-native collector (successor to Grafana Agent) — River/Alloy pipeline config
Pyroscope	Continuous profiling storage and query — CPU, memory, goroutines, heap
cert-manager	TLS certificate issuance for ingress
prometheus-rules	Prometheus alert rules and Alertmanager receivers
grafana-rules	Grafana-managed alert rules and contact points

Troubleshooting

"Endpoint url cannot have fully qualified paths"

This error is produced by the MinIO SDK when s3_endpoint is passed with a scheme (https:// or http://). All three modules strip the scheme automatically, so this error should not appear. If it does, verify that s3_endpoint contains only the hostname and optional port — no scheme prefix.

# Correct
s3_endpoint = "fsn1.your-objectstorage.com"

# Also accepted — scheme is stripped automatically
s3_endpoint = "https://fsn1.your-objectstorage.com"

Mimir bundled MinIO conflict

The mimir-distributed Helm chart ships with MinIO enabled by default upstream. This module disables it (minio.enabled: false) because the bundled MinIO injects its own S3 configuration that conflicts with external storage backends, producing the "fully qualified paths" error above. No action is required from callers — the module handles this automatically.

`usage_stats` disabled in Mimir

Mimir's anonymous telemetry (usage_stats) is disabled by this module. When usage_stats is enabled and an S3-compatible endpoint is configured, Mimir attempts to send telemetry to a fully-qualified S3 path that triggers the MinIO SDK path validation error. Disabling it has no effect on Mimir's functionality.

Wrong cluster targeted by `kubectl`

The cert-manager and prometheus-rules modules use kubectl via a local-exec provisioner. If the KUBECONFIG environment variable is set in your shell, it overrides the Terraform provider's config_path, causing kubectl to target a different cluster than the one Terraform is managing.

Set kubeconfig_path explicitly on both modules to pin them to the correct config file:

module "cert_manager" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/cert-manager"
  cert_manager = {
    kubeconfig_path = "/path/to/your/kubeconfig"
  }
}

module "prometheus_rules" {
  source = "github.com/digitalis-io/terraform-k8s-monitoring//modules/prometheus-rules"
  prometheus_rules = {
    prometheus_release_id = module.prometheus.helm_release_id
    kubeconfig_path       = "/path/to/your/kubeconfig"
  }
}

Support

This project is maintained by Digitalis.io. For support, visit digitalis.io/contact.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
examples		examples
modules		modules
test		test
tests/compliance		tests/compliance
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
versions.tf		versions.tf

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

terraform-k8s-monitoring

Table of Contents

Prerequisites

Quick Start

Module Reference

cert-manager

mimir

prometheus

loki

tempo

otel-collector

alloy

pyroscope

prometheus-rules

grafana-rules

Common Recipes

Use S3 for Mimir storage (with IRSA)

Use S3-compatible storage (Hetzner, MinIO, Ceph)

Use GCS for Loki storage (with Workload Identity)

Add a custom Grafana dashboard from a JSON file

Add a custom Grafana dashboard by grafana.com ID

Add custom Prometheus alert rules from a YAML file

Enable Slack alerts (prometheus-rules)

Enable PagerDuty alerts (grafana-rules)

Enable continuous profiling (Pyroscope)

Enable Tempo metrics generator with Mimir backend

ClickHouse integration for logs and traces

Enable OpenTelemetry Operator for auto-instrumentation

Enable ingress for Grafana with TLS via cert-manager

Storage Backends

S3 credentials secret

Sharing one S3 bucket across Mimir storage types (Mimir only)

S3 endpoint format

Architecture

Troubleshooting

"Endpoint url cannot have fully qualified paths"

Mimir bundled MinIO conflict

usage_stats disabled in Mimir

Wrong cluster targeted by kubectl

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`usage_stats` disabled in Mimir

Wrong cluster targeted by `kubectl`

Packages