Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.12.0] - 2026-01-28

[Compare with previous version](https://github.com/sparkfabrik/terraform-google-services-monitoring/compare/0.11.0...0.12.0)

### Changed

- refs platform/board#4071: remove dependecies from [`terraform-sparkfabrik-gcp-http-monitoring`](https://github.com/sparkfabrik/terraform-sparkfabrik-gcp-http-monitoring) terraform module.
Comment thread
FabrizioCafolla marked this conversation as resolved.
Outdated

## [0.11.0] - 2026-01-14

[Compare with previous version](https://github.com/sparkfabrik/terraform-google-services-monitoring/compare/0.10.0...0.11.0)
Expand Down
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Supported services:

| Name | Version |
|------|---------|
| <a name="provider_google"></a> [google](#provider\_google) | 7.15.0 |
| <a name="provider_google"></a> [google](#provider\_google) | >= 5.10 |

## Requirements

Expand All @@ -53,10 +53,10 @@ Supported services:

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_cert_manager"></a> [cert\_manager](#input\_cert\_manager) | Configuration for cert-manager missing issuer log alert. Allows customization of project, cluster, namespace, notification channels, alert documentation, enablement, extra filters, auto-close timing, and notification rate limiting. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = string<br/> project_id = optional(string, null)<br/> namespace = optional(string, "cert-manager")<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> filter_extra = optional(string, "")<br/> })</pre> | n/a | yes |
| <a name="input_cloud_sql"></a> [cloud\_sql](#input\_cloud\_sql) | Configuration for Cloud SQL monitoring alerts. Supports customization of project, auto-close timing, notification channels, and per-instance alert thresholds for CPU, memory, and disk utilization. | <pre>object({<br/> project_id = optional(string, null)<br/> auto_close = optional(string, "86400s") # default 24h<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> instances = optional(map(object({<br/> cpu_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.90)<br/> alignment_period = optional(string, "120s")<br/> duration = optional(string, "300s")<br/> })), [<br/> {<br/> threshold = 0.85,<br/> duration = "1200s",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 1,<br/> duration = "300s",<br/> alignment_period = "60s",<br/> }<br/> ])<br/> memory_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.90)<br/> alignment_period = optional(string, "300s")<br/> duration = optional(string, "300s")<br/> })), [<br/> {<br/> severity = "WARNING",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 0.95,<br/> }<br/> ])<br/> disk_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.85)<br/> alignment_period = optional(string, "300s")<br/> duration = optional(string, "600s")<br/> })), [<br/> {<br/> severity = "WARNING",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 0.95,<br/> }<br/> ])<br/> })), {})<br/> })</pre> | n/a | yes |
| <a name="input_konnectivity_agent"></a> [konnectivity\_agent](#input\_konnectivity\_agent) | Configuration for Konnectivity agent deployment replica alert in GKE. Triggers when there are no available replicas. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> namespace = optional(string, "kube-system")<br/> deployment_name = optional(string, "konnectivity-agent")<br/> duration_seconds = optional(number, 60)<br/> auto_close_seconds = optional(number, 3600)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> notification_prompts = optional(list(string), null)<br/> })</pre> | n/a | yes |
| <a name="input_kyverno"></a> [kyverno](#input\_kyverno) | Configuration for Kyverno monitoring alerts. Allows customization of cluster name, project, notification channels, alert documentation, metric thresholds, auto-close timing, enablement, extra filters, and namespace. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = string<br/> project_id = optional(string, null)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> # Rate limit for notifications, e.g. "300s" for 5 minutes, used only for log match alerts<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> filter_extra = optional(string, "")<br/> namespace = optional(string, "kyverno")<br/> })</pre> | n/a | yes |
| <a name="input_cert_manager"></a> [cert\_manager](#input\_cert\_manager) | Configuration for cert-manager missing issuer log alert. Allows customization of project, cluster, namespace, notification channels, alert documentation, enablement, extra filters, auto-close timing, and notification rate limiting. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> namespace = optional(string, "cert-manager")<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> filter_extra = optional(string, "")<br/> })</pre> | `{}` | no |
| <a name="input_cloud_sql"></a> [cloud\_sql](#input\_cloud\_sql) | Configuration for Cloud SQL monitoring alerts. Supports customization of project, auto-close timing, notification channels, and per-instance alert thresholds for CPU, memory, and disk utilization. | <pre>object({<br/> enabled = optional(bool, true)<br/> project_id = optional(string, null)<br/> auto_close = optional(string, "86400s") # default 24h<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> instances = optional(map(object({<br/> cpu_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.90)<br/> alignment_period = optional(string, "120s")<br/> duration = optional(string, "300s")<br/> })), [<br/> {<br/> threshold = 0.85,<br/> duration = "1200s",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 1,<br/> duration = "300s",<br/> alignment_period = "60s",<br/> }<br/> ])<br/> memory_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.90)<br/> alignment_period = optional(string, "300s")<br/> duration = optional(string, "300s")<br/> })), [<br/> {<br/> severity = "WARNING",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 0.95,<br/> }<br/> ])<br/> disk_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.85)<br/> alignment_period = optional(string, "300s")<br/> duration = optional(string, "600s")<br/> })), [<br/> {<br/> severity = "WARNING",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 0.95,<br/> }<br/> ])<br/> })), {})<br/> })</pre> | `{}` | no |
| <a name="input_konnectivity_agent"></a> [konnectivity\_agent](#input\_konnectivity\_agent) | Configuration for Konnectivity agent deployment replica alert in GKE. Triggers when there are no available replicas. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> namespace = optional(string, "kube-system")<br/> deployment_name = optional(string, "konnectivity-agent")<br/> duration_seconds = optional(number, 60)<br/> auto_close_seconds = optional(number, 3600)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> notification_prompts = optional(list(string), null)<br/> })</pre> | `{}` | no |
| <a name="input_kyverno"></a> [kyverno](#input\_kyverno) | Configuration for Kyverno monitoring alerts. Allows customization of cluster name, project, notification channels, alert documentation, metric thresholds, auto-close timing, enablement, extra filters, and namespace. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> # Rate limit for notifications, e.g. "300s" for 5 minutes, used only for log match alerts<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> filter_extra = optional(string, "")<br/> namespace = optional(string, "kyverno")<br/> })</pre> | `{}` | no |
| <a name="input_litellm"></a> [litellm](#input\_litellm) | Configuration for LiteLLM monitoring alerts. Supports uptime checks for HTTP endpoints and container-level alerts (pod restarts) in GKE. Each app is identified by its name (map key). | <pre>object({<br/> enabled = optional(bool, false)<br/> project_id = optional(string, null)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> cluster_name = optional(string, null)<br/><br/> apps = optional(map(object({<br/> uptime_check = optional(object({<br/> enabled = optional(bool, true)<br/> host = string<br/> path = optional(string, "/health/readiness")<br/> }), null)<br/><br/> container_check = optional(object({<br/> enabled = optional(bool, true)<br/> namespace = string<br/> pod_restart = optional(object({<br/> threshold = optional(number, 0)<br/> alignment_period = optional(number, 60)<br/> duration = optional(number, 180)<br/> auto_close_seconds = optional(number, 3600)<br/> notification_prompts = optional(list(string), null)<br/> }), {})<br/> }), null)<br/> })), {})<br/> })</pre> | `{}` | no |
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | List of notification channel IDs to notify when an alert is triggered | `list(string)` | `[]` | no |
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | The Google Cloud project ID where logging exclusions will be created | `string` | n/a | yes |
Expand Down Expand Up @@ -90,7 +90,7 @@ Supported services:

| Name | Source | Version |
|------|--------|---------|
| <a name="module_litellm_uptime_checks"></a> [litellm\_uptime\_checks](#module\_litellm\_uptime\_checks) | github.com/sparkfabrik/terraform-sparkfabrik-gcp-http-monitoring | 1.0.0 |
| <a name="module_typesense_uptime_checks"></a> [typesense\_uptime\_checks](#module\_typesense\_uptime\_checks) | github.com/sparkfabrik/terraform-sparkfabrik-gcp-http-monitoring | 1.0.0 |
| <a name="module_litellm_uptime_checks"></a> [litellm\_uptime\_checks](#module\_litellm\_uptime\_checks) | ./modules/http_monitoring | n/a |
| <a name="module_typesense_uptime_checks"></a> [typesense\_uptime\_checks](#module\_typesense\_uptime\_checks) | ./modules/http_monitoring | n/a |

<!-- END_TF_DOCS -->
9 changes: 5 additions & 4 deletions cert_manager.tf
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,14 @@ locals {
EOT
)
cert_manager_notification_channels = var.cert_manager.notification_enabled ? (length(var.cert_manager.notification_channels) > 0 ? var.cert_manager.notification_channels : var.notification_channels) : []
cert_manager_cluster_name = var.cert_manager.cluster_name != null ? trimspace(var.cert_manager.cluster_name) : ""

cert_manager_log_filter = <<-EOT
cert_manager_log_filter = local.cert_manager_cluster_name != "" ? (<<-EOT
(
(
resource.type="k8s_container"
AND resource.labels.project_id="${local.cert_manager_project_id}"
AND resource.labels.cluster_name="${var.cert_manager.cluster_name}"
AND resource.labels.cluster_name="${local.cert_manager_cluster_name}"
AND resource.labels.namespace_name="${var.cert_manager.namespace}"
)
Comment thread
FabrizioCafolla marked this conversation as resolved.
OR (
Expand All @@ -34,13 +35,13 @@ locals {
)
${trimspace(var.cert_manager.filter_extra)}
EOT
) : ""
}

resource "google_monitoring_alert_policy" "cert_manager_logmatch_alert" {
count = (
var.cert_manager.enabled
&& trimspace(var.cert_manager.cluster_name) != ""
&& var.cert_manager.cluster_name != null
&& local.cert_manager_cluster_name != ""
) ? 1 : 0
Comment thread
FabrizioCafolla marked this conversation as resolved.

display_name = "cert-manager missing Issuer/ClusterIssuer (cluster=${var.cert_manager.cluster_name}, namespace=${var.cert_manager.namespace})"
Expand Down
10 changes: 6 additions & 4 deletions kyverno.tf
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@ locals {
alert_documentation = var.kyverno.alert_documentation != null ? var.kyverno.alert_documentation : "Kyverno controllers produced ERROR logs in namespace ${var.kyverno.namespace}."
kyverno_notification_channels = var.kyverno.notification_enabled ? (length(var.kyverno.notification_channels) > 0 ? var.kyverno.notification_channels : var.notification_channels) : []

kyverno_log_filter = <<-EOT
kyverno_cluster_name = var.kyverno.cluster_name != null ? trimspace(var.kyverno.cluster_name) : ""

kyverno_log_filter = local.kyverno_cluster_name != "" ? (<<-EOT
resource.type="k8s_container"
AND resource.labels.project_id="${local.kyverno_project_id}"
AND resource.labels.cluster_name="${var.kyverno.cluster_name}"
AND resource.labels.cluster_name="${local.kyverno_cluster_name}"
AND resource.labels.namespace_name="${var.kyverno.namespace}"
Comment thread
FabrizioCafolla marked this conversation as resolved.
AND (
labels."k8s-pod/app_kubernetes_io/component"=~"(admission-controller|background-controller|cleanup-controller|reports-controller)"
Expand Down Expand Up @@ -48,13 +50,13 @@ locals {
)
${trimspace(var.kyverno.filter_extra)}
EOT
) : ""
}

resource "google_monitoring_alert_policy" "kyverno_logmatch_alert" {
count = (
var.kyverno.enabled
&& trimspace(var.kyverno.cluster_name) != ""
&& var.kyverno.cluster_name != null
&& local.kyverno_cluster_name != ""
) ? 1 : 0

display_name = "Kyverno controllers ERROR logs (namespace=${var.kyverno.namespace})"
Expand Down
4 changes: 2 additions & 2 deletions lite_llm.tf
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ locals {
module "litellm_uptime_checks" {
for_each = local.litellm_uptime_checks

source = "github.com/sparkfabrik/terraform-sparkfabrik-gcp-http-monitoring?ref=1.0.0"
gcp_project = local.litellm_project
source = "./modules/http_monitoring"
gcp_project_id = local.litellm_project
uptime_monitoring_host = each.value.host
uptime_monitoring_path = each.value.path
alert_notification_channels = local.litellm_notification_channels
Comment thread
FabrizioCafolla marked this conversation as resolved.
Expand Down
Loading