Skip to content

Commit 71d36bf

Browse files
committed
feat: add Kyverno monitoring alerts and update documentation
1 parent c7df9b6 commit 71d36bf

File tree

9 files changed

+228
-46
lines changed

9 files changed

+228
-46
lines changed

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,13 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
88

99
## [Unreleased]
1010

11+
## [0.3.0] - 2025-10-07
12+
13+
[Compare with previous version](https://github.com/sparkfabrik/terraform-google-services-monitoring/compare/0.2.0...0.3.0)
14+
15+
### Changed
16+
17+
- Add kyverno alert log.
1118
- Update module documentation.
1219

1320
## [0.2.0] - 2024-10-17

Makefile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
TERRAFORM_DOCS_VERSION ?= 0.20.0
2+
13
.PHONY: lint tfscan generate-docs
24

35
lint:
@@ -10,4 +12,4 @@ generate-docs: lint
1012
docker run --rm -u $$(id -u) \
1113
--volume "$(PWD):/terraform-docs" \
1214
-w /terraform-docs \
13-
quay.io/terraform-docs/terraform-docs:0.16.0 markdown table --config .terraform-docs.yml --output-file README.md --output-mode inject .
15+
quay.io/terraform-docs/terraform-docs:$(TERRAFORM_DOCS_VERSION) markdown table --config .terraform-docs.yml --output-file README.md --output-mode inject .

README.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,16 @@ This module creates a set of monitoring alerts for Google Cloud Platform service
55
Supported services:
66

77
- Cloud SQL
8+
89
- CPU usage
910
- Storage usage
1011
- Memory usage
1112

13+
- Kyverno
14+
15+
- Error logs for admission-controller, background-controller, cleanup-controller, reports-controller
16+
- Metric threshold (optional)
17+
1218
<!-- BEGIN_TF_DOCS -->
1319
## Providers
1420

@@ -27,10 +33,10 @@ Supported services:
2733

2834
| Name | Description | Type | Default | Required |
2935
|------|-------------|------|---------|:--------:|
30-
| <a name="input_auto_close"></a> [auto\_close](#input\_auto\_close) | n/a | `string` | `"86400s"` | no |
31-
| <a name="input_cloud_sql"></a> [cloud\_sql](#input\_cloud\_sql) | n/a | <pre>object({<br> project = optional(string, null)<br> auto_close = optional(string, null)<br> notification_channels = optional(list(string), [])<br> instances = optional(map(object({<br> cpu_utilization = optional(list(object({<br> severity = optional(string, "WARNING"),<br> threshold = optional(number, 0.90)<br> alignment_period = optional(string, "120s")<br> duration = optional(string, "300s")<br> })), [<br> {<br> threshold = 0.85,<br> duration = "1200s",<br> },<br> {<br> severity = "CRITICAL",<br> threshold = 1,<br> duration = "300s",<br> alignment_period = "60s",<br> }<br> ])<br> memory_utilization = optional(list(object({<br> severity = optional(string, "WARNING"),<br> threshold = optional(number, 0.90)<br> alignment_period = optional(string, "300s")<br> duration = optional(string, "300s")<br> })), [<br> {<br> severity = "WARNING",<br> },<br> {<br> severity = "CRITICAL",<br> threshold = 0.95,<br> }<br> ])<br> disk_utilization = optional(list(object({<br> severity = optional(string, "WARNING"),<br> threshold = optional(number, 0.85)<br> alignment_period = optional(string, "300s")<br> duration = optional(string, "600s")<br> })), [<br> {<br> severity = "WARNING",<br> },<br> {<br> severity = "CRITICAL",<br> threshold = 0.95, <br> }<br> ])<br> })), {})<br> })</pre> | n/a | yes |
32-
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | n/a | `list(string)` | `[]` | no |
33-
| <a name="input_project"></a> [project](#input\_project) | n/a | `string` | `null` | no |
36+
| <a name="input_cloud_sql"></a> [cloud\_sql](#input\_cloud\_sql) | Configuration for Cloud SQL monitoring alerts. Supports customization of project, auto-close timing, notification channels, and per-instance alert thresholds for CPU, memory, and disk utilization. | <pre>object({<br/> project_id = optional(string, null)<br/> auto_close = optional(string, "86400s") # default 24h<br/> notification_channels = optional(list(string), [])<br/> instances = optional(map(object({<br/> cpu_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.90)<br/> alignment_period = optional(string, "120s")<br/> duration = optional(string, "300s")<br/> })), [<br/> {<br/> threshold = 0.85,<br/> duration = "1200s",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 1,<br/> duration = "300s",<br/> alignment_period = "60s",<br/> }<br/> ])<br/> memory_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.90)<br/> alignment_period = optional(string, "300s")<br/> duration = optional(string, "300s")<br/> })), [<br/> {<br/> severity = "WARNING",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 0.95,<br/> }<br/> ])<br/> disk_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.85)<br/> alignment_period = optional(string, "300s")<br/> duration = optional(string, "600s")<br/> })), [<br/> {<br/> severity = "WARNING",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 0.95,<br/> }<br/> ])<br/> })), {})<br/> })</pre> | n/a | yes |
37+
| <a name="input_kyverno"></a> [kyverno](#input\_kyverno) | Configuration for Kyverno monitoring alerts. Allows customization of cluster name, project, notification channels, alert documentation, metric thresholds, auto-close timing, enablement, extra filters, and namespace. | <pre>object({<br/> cluster_name = string<br/> project_id = optional(string, null)<br/> notification_channels = optional(list(string), [])<br/> notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> use_metric_threshold = optional(bool, true)<br/> metric_threshold_count = optional(number, 2)<br/> metric_lookback_minutes = optional(number, 1)<br/> auto_close_seconds = optional(number, 3600)<br/> enabled = optional(bool, true)<br/> filter_extra = optional(string, "")<br/> namespace = optional(string, "kyverno")<br/> })</pre> | n/a | yes |
38+
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | List of notification channel IDs to notify when an alert is triggered | `list(string)` | `[]` | no |
39+
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | The Google Cloud project ID where logging exclusions will be created | `string` | n/a | yes |
3440

3541
## Outputs
3642

@@ -44,13 +50,15 @@ Supported services:
4450

4551
| Name | Type |
4652
|------|------|
53+
| [google_logging_metric.kyverno_error_metric](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/logging_metric) | resource |
4754
| [google_monitoring_alert_policy.cloud_sql_cpu_utilization](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
4855
| [google_monitoring_alert_policy.cloud_sql_disk_utilization](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
4956
| [google_monitoring_alert_policy.cloud_sql_memory_utilization](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
57+
| [google_monitoring_alert_policy.kyverno_logmatch_alert](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
58+
| [google_monitoring_alert_policy.kyverno_metric_threshold_alert](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
5059

5160
## Modules
5261

5362
No modules.
5463

55-
5664
<!-- END_TF_DOCS -->

cloud-sql.tf

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,11 @@
33
# ----------------------
44
locals {
55
# Use the cloud_sql project if specified, otherwise use the project.
6-
cloud_sql_project = var.cloud_sql.project != null ? var.cloud_sql.project : var.project
6+
cloud_sql_project = var.cloud_sql.project_id != null ? var.cloud_sql.project_id : var.project_id
77

88
# Use the cloud_sql notification channels for if not specified in the configuration.
99
cloud_sql_notification_channels = length(var.cloud_sql.notification_channels) > 0 ? var.cloud_sql.notification_channels : var.notification_channels
1010

11-
# Use the cloud_sql auto_close if specified, otherwise use the auto_close.
12-
cloud_sql_auto_close = var.cloud_sql.auto_close != null ? var.cloud_sql.auto_close : var.auto_close
13-
1411
cloud_sql_cpu_utilization = {
1512
for item in flatten(
1613
[
@@ -22,7 +19,7 @@ locals {
2219
},
2320
cpu_utilization
2421
)
25-
]
22+
]
2623
]
2724
) : "${item.instance}--${item.severity}--${item.threshold}" => item
2825
}
@@ -38,10 +35,10 @@ locals {
3835
},
3936
memory_utilization
4037
)
41-
]
38+
]
4239
]
4340
) : "${item.instance}--${item.severity}--${item.threshold}" => item
44-
}
41+
}
4542

4643
cloud_sql_disk_utilization = {
4744
for item in flatten(
@@ -54,10 +51,10 @@ locals {
5451
},
5552
disk_utilization
5653
)
57-
]
54+
]
5855
]
5956
) : "${item.instance}--${item.severity}--${item.threshold}" => item
60-
}
57+
}
6158
}
6259

6360
# ----------------------
@@ -67,7 +64,7 @@ resource "google_monitoring_alert_policy" "cloud_sql_cpu_utilization" {
6764
for_each = local.cloud_sql_cpu_utilization
6865

6966
display_name = "${local.cloud_sql_project} ${each.value.instance} - CPU utilization ${each.value.severity} ${each.value.threshold * 100}%"
70-
combiner = "OR"
67+
combiner = "OR"
7168
severity = each.value.severity
7269

7370
conditions {
@@ -87,7 +84,7 @@ resource "google_monitoring_alert_policy" "cloud_sql_cpu_utilization" {
8784
display_name = "${local.cloud_sql_project} ${each.value.instance} - CPU utilization ${each.value.severity} ${each.value.threshold * 100}%"
8885
}
8986
alert_strategy {
90-
auto_close = local.cloud_sql_auto_close
87+
auto_close = var.cloud_sql.auto_close
9188
}
9289
notification_channels = local.cloud_sql_notification_channels
9390
}
@@ -117,7 +114,7 @@ resource "google_monitoring_alert_policy" "cloud_sql_memory_utilization" {
117114
}
118115

119116
alert_strategy {
120-
auto_close = local.cloud_sql_auto_close
117+
auto_close = var.cloud_sql.auto_close
121118
}
122119

123120
notification_channels = local.cloud_sql_notification_channels
@@ -149,7 +146,7 @@ resource "google_monitoring_alert_policy" "cloud_sql_disk_utilization" {
149146
}
150147

151148
alert_strategy {
152-
auto_close = local.cloud_sql_auto_close
149+
auto_close = var.cloud_sql.auto_close
153150
}
154151
notification_channels = local.cloud_sql_notification_channels
155152
}

examples/main.tf

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@
44

55
locals {
66
# Enable all Cdoud SQL monitorings on selected instances, eg.
7-
cloud_sql = {
8-
instances = {
9-
(google_sql_database_instance.master.name) = {}
7+
cloud_sql = {
8+
instances = {
9+
(google_sql_database_instance.master.name) = {}
1010
(google_sql_database_instance.stage.name) = {}
11-
}
12-
}
11+
}
12+
}
1313

1414
# Use custom Cloud SQL cpu monitoring on google_sql_database_instance.master.name
1515
# Use all default Cloud SQL monitoring on google_sql_database_instance.stage.name
@@ -35,7 +35,7 @@ locals {
3535
# cloud_sql = {
3636
# instances = {
3737
# (google_sql_database_instance.master.stage) = { cpu_utilization = [] }
38-
# (google_sql_database_instance.master.prod) = {}
38+
# (google_sql_database_instance.master.prod) = {}
3939
# }
4040
# }
4141

@@ -46,6 +46,16 @@ module "example" {
4646
version = ">= 0.1.0"
4747

4848
notification_channels = var.notification_channels
49-
project = var.project
50-
cloud_sql = local.cloud_sql
49+
project = var.project
50+
cloud_sql = local.cloud_sql
51+
kyverno = {
52+
cluster_name = "test-cluster"
53+
enabled = true
54+
use_metric_threshold = true
55+
metric_threshold_count = 5
56+
notification_channels = []
57+
# Optional filter for log entries, exclude known non-actionable messages
58+
# e.g., "-textPayload:\"stale GroupVersion discovery: metrics.k8s.io/v1beta1\""
59+
filter_extra = "-textPayload:\"stale GroupVersion discovery: metrics.k8s.io/v1beta1\""
60+
}
5161
}

examples/variables.tf

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,27 @@
11

22
variable "project" {
33
type = string
4-
default = ""
4+
default = "test-project"
55
}
66

77
variable "notification_channels" {
88
type = list(string)
99
default = []
1010
}
11+
12+
variable "kyverno" {
13+
description = "Configurazione completa del monitoraggio Kyverno"
14+
type = object({
15+
cluster_name = string
16+
project_id = optional(string, null)
17+
notification_channels = optional(list(string), [])
18+
alert_documentation = optional(string, "Kyverno controllers produced ERROR logs in namespace kyverno.")
19+
use_metric_threshold = optional(bool, true)
20+
metric_threshold_count = optional(number, 2)
21+
metric_lookback_minutes = optional(number, 1)
22+
auto_close_seconds = optional(number, 3600)
23+
enabled = optional(bool, true)
24+
filter_extra = optional(string, "")
25+
namespace = optional(string, "kyverno")
26+
})
27+
}

kyverno_log_alert.tf

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
locals {
2+
kyverno_project_id = var.kyverno.project_id != null ? var.kyverno.project_id : var.project_id
3+
alert_documentation = var.kyverno.alert_documentation != null ? var.kyverno.alert_documentation : "Kyverno controllers produced ERROR logs in namespace ${var.kyverno.namespace}."
4+
kyverno_notification_channels = length(var.kyverno.notification_channels) > 0 ? var.kyverno.notification_channels : var.notification_channels
5+
6+
kyverno_log_filter = <<-EOT
7+
resource.type="k8s_container"
8+
resource.labels.project_id="${local.kyverno_project_id}"
9+
resource.labels.cluster_name="${var.kyverno.cluster_name}"
10+
resource.labels.namespace_name="${var.kyverno.namespace}"
11+
severity>=ERROR
12+
(
13+
labels."k8s-pod/app_kubernetes_io/component"=~"(admission-controller|background-controller|cleanup-controller|reports-controller)"
14+
OR resource.labels.pod_name=~"kyverno-(admission|background|cleanup|reports)-controller-.*"
15+
)
16+
${trimspace(var.kyverno.filter_extra)}
17+
EOT
18+
19+
kyverno_metric_name = lower(replace(
20+
"kyverno_error_logs_count_${var.kyverno.cluster_name}_${var.kyverno.namespace}",
21+
"/[^a-zA-Z0-9_]/", "_"
22+
))
23+
}
24+
25+
resource "google_monitoring_alert_policy" "kyverno_logmatch_alert" {
26+
count = (
27+
var.kyverno.enabled
28+
&& !var.kyverno.use_metric_threshold
29+
&& trimspace(var.kyverno.cluster_name) != ""
30+
) ? 1 : 0
31+
32+
display_name = "Kyverno controllers ERROR logs (namespace=${var.kyverno.namespace})"
33+
combiner = "OR"
34+
enabled = var.kyverno.enabled
35+
36+
conditions {
37+
display_name = "Kyverno ERROR in logs"
38+
condition_matched_log {
39+
filter = local.kyverno_log_filter
40+
}
41+
}
42+
43+
documentation {
44+
content = local.alert_documentation
45+
mime_type = "text/markdown"
46+
}
47+
48+
notification_channels = local.kyverno_notification_channels
49+
50+
alert_strategy {
51+
auto_close = "${var.kyverno.auto_close_seconds}s"
52+
notification_rate_limit {
53+
period = var.kyverno.notification_rate_limit
54+
}
55+
}
56+
}
57+
58+
resource "google_logging_metric" "kyverno_error_metric" {
59+
count = (
60+
var.kyverno.enabled
61+
&& var.kyverno.use_metric_threshold
62+
&& trimspace(var.kyverno.cluster_name) != ""
63+
) ? 1 : 0
64+
65+
name = local.kyverno_metric_name
66+
description = "Count of ERROR+ logs from Kyverno controllers in namespace ${var.kyverno.namespace}"
67+
filter = local.kyverno_log_filter
68+
69+
metric_descriptor {
70+
metric_kind = "DELTA"
71+
value_type = "INT64"
72+
unit = "1"
73+
}
74+
}
75+
76+
resource "google_monitoring_alert_policy" "kyverno_metric_threshold_alert" {
77+
count = (
78+
var.kyverno.enabled
79+
&& var.kyverno.use_metric_threshold
80+
&& trimspace(var.kyverno.cluster_name) != ""
81+
) ? 1 : 0
82+
83+
display_name = "Kyverno ERROR rate alert (namespace=${var.kyverno.namespace})"
84+
combiner = "OR"
85+
enabled = var.kyverno.enabled
86+
87+
conditions {
88+
display_name = "Kyverno ERROR rate alert >= ${var.kyverno.metric_threshold_count} logs in ${var.kyverno.metric_lookback_minutes} min (namespace ${var.kyverno.namespace})"
89+
condition_threshold {
90+
filter = format(
91+
"metric.type=\"logging.googleapis.com/user/%s\" resource.type=\"global\"",
92+
google_logging_metric.kyverno_error_metric[0].name
93+
)
94+
95+
comparison = "COMPARISON_GE"
96+
threshold_value = var.kyverno.metric_threshold_count
97+
duration = "0s"
98+
99+
aggregations {
100+
alignment_period = "${var.kyverno.metric_lookback_minutes * 60}s"
101+
per_series_aligner = "ALIGN_DELTA"
102+
cross_series_reducer = "REDUCE_SUM"
103+
group_by_fields = []
104+
}
105+
106+
trigger {
107+
count = 1
108+
}
109+
}
110+
}
111+
112+
documentation {
113+
content = local.alert_documentation
114+
mime_type = "text/markdown"
115+
}
116+
117+
notification_channels = local.kyverno_notification_channels
118+
119+
alert_strategy {
120+
auto_close = "${var.kyverno.auto_close_seconds}s"
121+
notification_rate_limit {
122+
period = var.kyverno.notification_rate_limit
123+
}
124+
}
125+
}

main.tf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

0 commit comments

Comments
 (0)