Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,16 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.4.0] - 2025-10-13

[Compare with previous version](https://github.com/sparkfabrik/terraform-google-services-monitoring/compare/0.3.0...0.4.0)

### changed

- Rename tf file from `cloud-sql.tf` to `cloud_sql.tf`.
- Rename tf file from `kyverno_log_alert.tf` to `kyverno.tf`.
- Add cert-manager missing issuer alert log.

## [0.3.0] - 2025-10-07

[Compare with previous version](https://github.com/sparkfabrik/terraform-google-services-monitoring/compare/0.2.0...0.3.0)
Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@ Supported services:
- Kyverno

- Error logs for admission-controller, background-controller, cleanup-controller, reports-controller
- Metric threshold (optional)

- cert-manager
- Error logs for cert-manager controller when an Issuer or ClusterIssuer is missing

<!-- BEGIN_TF_DOCS -->
## Providers
Expand All @@ -33,6 +35,7 @@ Supported services:

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_cert_manager"></a> [cert\_manager](#input\_cert\_manager) | Configuration for cert-manager missing issuer log alert. Allows customization of project, cluster, namespace, notification channels, alert documentation, enablement, extra filters, auto-close timing, and notification rate limiting. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = string<br/> project_id = optional(string, null)<br/> namespace = optional(string, "cert-manager")<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> filter_extra = optional(string, "")<br/> })</pre> | n/a | yes |
| <a name="input_cloud_sql"></a> [cloud\_sql](#input\_cloud\_sql) | Configuration for Cloud SQL monitoring alerts. Supports customization of project, auto-close timing, notification channels, and per-instance alert thresholds for CPU, memory, and disk utilization. | <pre>object({<br/> project_id = optional(string, null)<br/> auto_close = optional(string, "86400s") # default 24h<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> instances = optional(map(object({<br/> cpu_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.90)<br/> alignment_period = optional(string, "120s")<br/> duration = optional(string, "300s")<br/> })), [<br/> {<br/> threshold = 0.85,<br/> duration = "1200s",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 1,<br/> duration = "300s",<br/> alignment_period = "60s",<br/> }<br/> ])<br/> memory_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.90)<br/> alignment_period = optional(string, "300s")<br/> duration = optional(string, "300s")<br/> })), [<br/> {<br/> severity = "WARNING",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 0.95,<br/> }<br/> ])<br/> disk_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.85)<br/> alignment_period = optional(string, "300s")<br/> duration = optional(string, "600s")<br/> })), [<br/> {<br/> severity = "WARNING",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 0.95,<br/> }<br/> ])<br/> })), {})<br/> })</pre> | n/a | yes |
| <a name="input_kyverno"></a> [kyverno](#input\_kyverno) | Configuration for Kyverno monitoring alerts. Allows customization of cluster name, project, notification channels, alert documentation, metric thresholds, auto-close timing, enablement, extra filters, and namespace. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = string<br/> project_id = optional(string, null)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> # Rate limit for notifications, e.g. "300s" for 5 minutes, used only for log match alerts<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> filter_extra = optional(string, "")<br/> namespace = optional(string, "kyverno")<br/> })</pre> | n/a | yes |
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | List of notification channel IDs to notify when an alert is triggered | `list(string)` | `[]` | no |
Expand All @@ -50,6 +53,7 @@ Supported services:

| Name | Type |
|------|------|
| [google_monitoring_alert_policy.cert_manager_logmatch_alert](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
| [google_monitoring_alert_policy.cloud_sql_cpu_utilization](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
| [google_monitoring_alert_policy.cloud_sql_disk_utilization](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
| [google_monitoring_alert_policy.cloud_sql_memory_utilization](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
Expand Down
70 changes: 70 additions & 0 deletions cert_manager.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
locals {
cert_manager_project_id = var.cert_manager.project_id != null ? var.cert_manager.project_id : var.project_id
cert_manager_alert_documentation = (
var.cert_manager.alert_documentation != null
? var.cert_manager.alert_documentation
: <<-EOT
cert-manager is reporting that an Issuer or ClusterIssuer resource referenced by a Certificate cannot be found. This may indicate that the Issuer/ClusterIssuer has been deleted or is otherwise unavailable.
EOT
)
cert_manager_notification_channels = var.cert_manager.notification_enabled ? (length(var.cert_manager.notification_channels) > 0 ? var.cert_manager.notification_channels : var.notification_channels) : []

cert_manager_log_filter = <<-EOT
(
(
resource.type="k8s_container"
AND resource.labels.project_id="${local.cert_manager_project_id}"
AND resource.labels.cluster_name="${var.cert_manager.cluster_name}"
AND resource.labels.namespace_name="${var.cert_manager.namespace}"
)
OR (
log_id("events")
AND resource.labels.project_id="${local.cert_manager_project_id}"
AND resource.labels.cluster_name="${var.cert_manager.cluster_name}"
AND (
jsonPayload.involvedObject.namespace="${var.cert_manager.namespace}"
OR jsonPayload.metadata.namespace="${var.cert_manager.namespace}"
)
)
)
AND (
textPayload=~"Referenced \"(Issuer|ClusterIssuer)\" not found"
OR jsonPayload.message=~"Referenced \"(Issuer|ClusterIssuer)\" not found"
OR jsonPayload.note=~"Referenced \"(Issuer|ClusterIssuer)\" not found"
Comment on lines +31 to +33
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern is duplicated across three lines. Consider extracting it to a local variable to improve maintainability and reduce the chance of inconsistencies.

Copilot uses AI. Check for mistakes.
)
${trimspace(var.cert_manager.filter_extra)}
EOT
}

resource "google_monitoring_alert_policy" "cert_manager_logmatch_alert" {
count = (
var.cert_manager.enabled
&& trimspace(var.cert_manager.cluster_name) != ""
&& var.cert_manager.cluster_name != null
) ? 1 : 0

display_name = "cert-manager missing Issuer/ClusterIssuer (cluster=${var.cert_manager.cluster_name}, namespace=${var.cert_manager.namespace})"
combiner = "OR"
enabled = var.cert_manager.enabled

conditions {
display_name = "Log match: cert-manager Issuer/ClusterIssuer not found"
condition_matched_log {
filter = local.cert_manager_log_filter
}
}

documentation {
content = local.cert_manager_alert_documentation
mime_type = "text/markdown"
}

notification_channels = local.cert_manager_notification_channels

alert_strategy {
auto_close = "${var.cert_manager.auto_close_seconds}s"
notification_rate_limit {
period = var.cert_manager.logmatch_notification_rate_limit
}
}
}
File renamed without changes.
14 changes: 9 additions & 5 deletions examples/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,17 @@ module "example" {
project_id = var.project_id
cloud_sql = local.cloud_sql
kyverno = {
cluster_name = "test-cluster"
enabled = true
use_metric_threshold = true
metric_threshold_count = 5
notification_channels = []
cluster_name = "test-cluster"
enabled = true
notification_channels = []
# Optional filter for log entries, exclude known non-actionable messages
# e.g., "-textPayload:\"stale GroupVersion discovery: metrics.k8s.io/v1beta1\""
filter_extra = "-textPayload:\"stale GroupVersion discovery: metrics.k8s.io/v1beta1\""
}
cert_manager = {
cluster_name = "test-cluster"
namespace = "cert-manager"
enabled = true
notification_channels = []
}
}
38 changes: 27 additions & 11 deletions examples/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,32 @@ variable "notification_channels" {
variable "kyverno" {
description = "Configuration for Kyverno monitoring alerts. Allows customization of cluster name, project, notification channels, alert documentation, metric thresholds, auto-close timing, enablement, extra filters, and namespace."
type = object({
enabled = optional(bool, true)
project_id = optional(string, null)
cluster_name = string
namespace = optional(string, "kyverno")
notification_enabled = optional(bool, true)
notification_channels = optional(list(string), [])
alert_documentation = optional(string, null)
metric_threshold_count = optional(number, 2)
metric_lookback_minutes = optional(number, 1)
auto_close_seconds = optional(number, 3600)
filter_extra = optional(string, "")
enabled = optional(bool, true)
cluster_name = string
project_id = optional(string, null)
notification_enabled = optional(bool, true)
notification_channels = optional(list(string), [])
# Rate limit for notifications, e.g. "300s" for 5 minutes, used only for log match alerts
logmatch_notification_rate_limit = optional(string, "300s")
alert_documentation = optional(string, null)
auto_close_seconds = optional(number, 3600)
filter_extra = optional(string, "")
namespace = optional(string, "kyverno")
})
}

variable "cert_manager" {
description = "Configuration for cert-manager missing issuer log alert. Allows customization of project, cluster, namespace, notification channels, alert documentation, enablement, extra filters, auto-close timing, and notification rate limiting."
type = object({
enabled = optional(bool, true)
cluster_name = string
project_id = optional(string, null)
namespace = optional(string, "cert-manager")
notification_enabled = optional(bool, true)
notification_channels = optional(list(string), [])
logmatch_notification_rate_limit = optional(string, "300s")
alert_documentation = optional(string, null)
auto_close_seconds = optional(number, 3600)
filter_extra = optional(string, "")
})
}
1 change: 1 addition & 0 deletions kyverno_log_alert.tf → kyverno.tf
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ resource "google_monitoring_alert_policy" "kyverno_logmatch_alert" {
count = (
var.kyverno.enabled
&& trimspace(var.kyverno.cluster_name) != ""
&& var.kyverno.cluster_name != null
) ? 1 : 0

display_name = "Kyverno controllers ERROR logs (namespace=${var.kyverno.namespace})"
Expand Down
16 changes: 16 additions & 0 deletions variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,19 @@ variable "kyverno" {
namespace = optional(string, "kyverno")
})
}

variable "cert_manager" {
description = "Configuration for cert-manager missing issuer log alert. Allows customization of project, cluster, namespace, notification channels, alert documentation, enablement, extra filters, auto-close timing, and notification rate limiting."
type = object({
enabled = optional(bool, true)
cluster_name = string
project_id = optional(string, null)
namespace = optional(string, "cert-manager")
notification_enabled = optional(bool, true)
notification_channels = optional(list(string), [])
logmatch_notification_rate_limit = optional(string, "300s")
alert_documentation = optional(string, null)
auto_close_seconds = optional(number, 3600)
filter_extra = optional(string, "")
})
}