Skip to content

Commit 9198752

Browse files
update
1 parent f9f50f0 commit 9198752

File tree

6 files changed

+37
-12
lines changed

6 files changed

+37
-12
lines changed

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
1414

1515
### Changed
1616

17-
- refs platform/board#4071: remove dependecies from [`terraform-sparkfabrik-gcp-http-monitoring`](https://github.com/sparkfabrik/terraform-sparkfabrik-gcp-http-monitoring) terraform module. **⚠️ WARN** Disabled monitoring alerts by default for `kyverno`, `cert-manager`, and `konnectivity_agent`, from now on, you must add the explicit value `enabled = true` to activate these alerts.
17+
- refs platform/board#4071: remove dependecies from [`terraform-sparkfabrik-gcp-http-monitoring`](https://github.com/sparkfabrik/terraform-sparkfabrik-gcp-http-monitoring) terraform module.
1818

1919
## [0.11.0] - 2026-01-14
2020

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -53,10 +53,10 @@ Supported services:
5353

5454
| Name | Description | Type | Default | Required |
5555
|------|-------------|------|---------|:--------:|
56-
| <a name="input_cert_manager"></a> [cert\_manager](#input\_cert\_manager) | Configuration for cert-manager missing issuer log alert. Allows customization of project, cluster, namespace, notification channels, alert documentation, enablement, extra filters, auto-close timing, and notification rate limiting. | <pre>object({<br/> enabled = optional(bool, false)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> namespace = optional(string, "cert-manager")<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> filter_extra = optional(string, "")<br/> })</pre> | `{}` | no |
56+
| <a name="input_cert_manager"></a> [cert\_manager](#input\_cert\_manager) | Configuration for cert-manager missing issuer log alert. Allows customization of project, cluster, namespace, notification channels, alert documentation, enablement, extra filters, auto-close timing, and notification rate limiting. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> namespace = optional(string, "cert-manager")<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> filter_extra = optional(string, "")<br/> })</pre> | `{}` | no |
5757
| <a name="input_cloud_sql"></a> [cloud\_sql](#input\_cloud\_sql) | Configuration for Cloud SQL monitoring alerts. Supports customization of project, auto-close timing, notification channels, and per-instance alert thresholds for CPU, memory, and disk utilization. | <pre>object({<br/> enabled = optional(bool, true)<br/> project_id = optional(string, null)<br/> auto_close = optional(string, "86400s") # default 24h<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> instances = optional(map(object({<br/> cpu_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.90)<br/> alignment_period = optional(string, "120s")<br/> duration = optional(string, "300s")<br/> })), [<br/> {<br/> threshold = 0.85,<br/> duration = "1200s",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 1,<br/> duration = "300s",<br/> alignment_period = "60s",<br/> }<br/> ])<br/> memory_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.90)<br/> alignment_period = optional(string, "300s")<br/> duration = optional(string, "300s")<br/> })), [<br/> {<br/> severity = "WARNING",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 0.95,<br/> }<br/> ])<br/> disk_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.85)<br/> alignment_period = optional(string, "300s")<br/> duration = optional(string, "600s")<br/> })), [<br/> {<br/> severity = "WARNING",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 0.95,<br/> }<br/> ])<br/> })), {})<br/> })</pre> | `{}` | no |
58-
| <a name="input_konnectivity_agent"></a> [konnectivity\_agent](#input\_konnectivity\_agent) | Configuration for Konnectivity agent deployment replica alert in GKE. Triggers when there are no available replicas. | <pre>object({<br/> enabled = optional(bool, false)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> namespace = optional(string, "kube-system")<br/> deployment_name = optional(string, "konnectivity-agent")<br/> duration_seconds = optional(number, 60)<br/> auto_close_seconds = optional(number, 3600)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> notification_prompts = optional(list(string), null)<br/> })</pre> | `{}` | no |
59-
| <a name="input_kyverno"></a> [kyverno](#input\_kyverno) | Configuration for Kyverno monitoring alerts. Allows customization of cluster name, project, notification channels, alert documentation, metric thresholds, auto-close timing, enablement, extra filters, and namespace. | <pre>object({<br/> enabled = optional(bool, false)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> # Rate limit for notifications, e.g. "300s" for 5 minutes, used only for log match alerts<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> filter_extra = optional(string, "")<br/> namespace = optional(string, "kyverno")<br/> })</pre> | `{}` | no |
58+
| <a name="input_konnectivity_agent"></a> [konnectivity\_agent](#input\_konnectivity\_agent) | Configuration for Konnectivity agent deployment replica alert in GKE. Triggers when there are no available replicas. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> namespace = optional(string, "kube-system")<br/> deployment_name = optional(string, "konnectivity-agent")<br/> duration_seconds = optional(number, 60)<br/> auto_close_seconds = optional(number, 3600)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> notification_prompts = optional(list(string), null)<br/> })</pre> | `{}` | no |
59+
| <a name="input_kyverno"></a> [kyverno](#input\_kyverno) | Configuration for Kyverno monitoring alerts. Allows customization of cluster name, project, notification channels, alert documentation, metric thresholds, auto-close timing, enablement, extra filters, and namespace. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> # Rate limit for notifications, e.g. "300s" for 5 minutes, used only for log match alerts<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> filter_extra = optional(string, "")<br/> namespace = optional(string, "kyverno")<br/> })</pre> | `{}` | no |
6060
| <a name="input_litellm"></a> [litellm](#input\_litellm) | Configuration for LiteLLM monitoring alerts. Supports uptime checks for HTTP endpoints and container-level alerts (pod restarts) in GKE. Each app is identified by its name (map key). | <pre>object({<br/> enabled = optional(bool, false)<br/> project_id = optional(string, null)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> cluster_name = optional(string, null)<br/><br/> apps = optional(map(object({<br/> uptime_check = optional(object({<br/> enabled = optional(bool, true)<br/> host = string<br/> path = optional(string, "/health/readiness")<br/> }), null)<br/><br/> container_check = optional(object({<br/> enabled = optional(bool, true)<br/> namespace = string<br/> pod_restart = optional(object({<br/> threshold = optional(number, 0)<br/> alignment_period = optional(number, 60)<br/> duration = optional(number, 180)<br/> auto_close_seconds = optional(number, 3600)<br/> notification_prompts = optional(list(string), null)<br/> }), {})<br/> }), null)<br/> })), {})<br/> })</pre> | `{}` | no |
6161
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | List of notification channel IDs to notify when an alert is triggered | `list(string)` | `[]` | no |
6262
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | The Google Cloud project ID where logging exclusions will be created | `string` | n/a | yes |

cert_manager.tf

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ locals {
88
EOT
99
)
1010
cert_manager_notification_channels = var.cert_manager.notification_enabled ? (length(var.cert_manager.notification_channels) > 0 ? var.cert_manager.notification_channels : var.notification_channels) : []
11+
cert_manager_cluster_name = var.cert_manager.cluster_name != null ? trimspace(var.cert_manager.cluster_name) : ""
1112

1213
cert_manager_log_filter = var.cert_manager.cluster_name != null ? (<<-EOT
1314
(
@@ -40,8 +41,7 @@ locals {
4041
resource "google_monitoring_alert_policy" "cert_manager_logmatch_alert" {
4142
count = (
4243
var.cert_manager.enabled
43-
&& try(var.cert_manager.cluster_name, "") != ""
44-
&& var.cert_manager.cluster_name != null
44+
&& local.cert_manager_cluster_name != ""
4545
) ? 1 : 0
4646

4747
display_name = "cert-manager missing Issuer/ClusterIssuer (cluster=${var.cert_manager.cluster_name}, namespace=${var.cert_manager.namespace})"

kyverno.tf

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ locals {
33
alert_documentation = var.kyverno.alert_documentation != null ? var.kyverno.alert_documentation : "Kyverno controllers produced ERROR logs in namespace ${var.kyverno.namespace}."
44
kyverno_notification_channels = var.kyverno.notification_enabled ? (length(var.kyverno.notification_channels) > 0 ? var.kyverno.notification_channels : var.notification_channels) : []
55

6+
kyverno_cluster_name = var.kyverno.cluster_name != null ? trimspace(var.kyverno.cluster_name) : ""
7+
68
kyverno_log_filter = var.kyverno.cluster_name != null ? (<<-EOT
79
resource.type="k8s_container"
810
AND resource.labels.project_id="${local.kyverno_project_id}"
@@ -54,8 +56,7 @@ locals {
5456
resource "google_monitoring_alert_policy" "kyverno_logmatch_alert" {
5557
count = (
5658
var.kyverno.enabled
57-
&& try(var.kyverno.cluster_name, "") != ""
58-
&& var.kyverno.cluster_name != null
59+
&& local.kyverno_cluster_name != ""
5960
) ? 1 : 0
6061

6162
display_name = "Kyverno controllers ERROR logs (namespace=${var.kyverno.namespace})"

modules/http_monitoring/variables.tf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ variable "uptime_monitoring_path" {
1717

1818
variable "uptime_check_period" {
1919
type = string
20-
description = "How often, in seconds, the uptime check is performed. Currently, the only supported values are 60s (1 minute), 300s (5 minutes), 600s (10 minutes), and 900s (15 minutes). Defaults to 300s."
20+
description = "How often, in seconds, the uptime check is performed. Currently, the only supported values are 60s (1 minute), 300s (5 minutes), 600s (10 minutes), and 900s (15 minutes)"
2121
default = "60s"
2222
}
2323

variables.tf

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ variable "kyverno" {
7272
description = "Configuration for Kyverno monitoring alerts. Allows customization of cluster name, project, notification channels, alert documentation, metric thresholds, auto-close timing, enablement, extra filters, and namespace."
7373
default = {}
7474
type = object({
75-
enabled = optional(bool, false)
75+
enabled = optional(bool, true)
7676
cluster_name = optional(string, null)
7777
project_id = optional(string, null)
7878
notification_enabled = optional(bool, true)
@@ -84,13 +84,21 @@ variable "kyverno" {
8484
filter_extra = optional(string, "")
8585
namespace = optional(string, "kyverno")
8686
})
87+
88+
validation {
89+
condition = (
90+
!var.kyverno.enabled ||
91+
(var.kyverno.cluster_name != null && var.kyverno.cluster_name != "")
92+
)
93+
error_message = "When 'enabled' is true, 'cluster_name' must be provided and cannot be empty."
94+
}
8795
}
8896

8997
variable "cert_manager" {
9098
description = "Configuration for cert-manager missing issuer log alert. Allows customization of project, cluster, namespace, notification channels, alert documentation, enablement, extra filters, auto-close timing, and notification rate limiting."
9199
default = {}
92100
type = object({
93-
enabled = optional(bool, false)
101+
enabled = optional(bool, true)
94102
cluster_name = optional(string, null)
95103
project_id = optional(string, null)
96104
namespace = optional(string, "cert-manager")
@@ -101,13 +109,21 @@ variable "cert_manager" {
101109
auto_close_seconds = optional(number, 3600)
102110
filter_extra = optional(string, "")
103111
})
112+
113+
validation {
114+
condition = (
115+
!var.cert_manager.enabled ||
116+
(var.cert_manager.cluster_name != null && var.cert_manager.cluster_name != "")
117+
)
118+
error_message = "When 'enabled' is true, 'cluster_name' must be provided and cannot be empty."
119+
}
104120
}
105121

106122
variable "konnectivity_agent" {
107123
description = "Configuration for Konnectivity agent deployment replica alert in GKE. Triggers when there are no available replicas."
108124
default = {}
109125
type = object({
110-
enabled = optional(bool, false)
126+
enabled = optional(bool, true)
111127
cluster_name = optional(string, null)
112128
project_id = optional(string, null)
113129
namespace = optional(string, "kube-system")
@@ -118,6 +134,14 @@ variable "konnectivity_agent" {
118134
notification_channels = optional(list(string), [])
119135
notification_prompts = optional(list(string), null)
120136
})
137+
138+
validation {
139+
condition = (
140+
!var.konnectivity_agent.enabled ||
141+
(var.konnectivity_agent.cluster_name != null && var.konnectivity_agent.cluster_name != "")
142+
)
143+
error_message = "When 'enabled' is true, 'cluster_name' must be provided and cannot be empty."
144+
}
121145
}
122146

123147
variable "typesense" {

0 commit comments

Comments
 (0)