Skip to content

Commit 00f0c52

Browse files
committed
feat: add Kyverno monitoring alerts and update documentation
1 parent c7df9b6 commit 00f0c52

File tree

9 files changed

+219
-46
lines changed

9 files changed

+219
-46
lines changed

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,13 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
88

99
## [Unreleased]
1010

11+
## [0.3.0] - 2025-10-07
12+
13+
[Compare with previous version](https://github.com/sparkfabrik/terraform-google-services-monitoring/compare/0.2.0...0.3.0)
14+
15+
### Changed
16+
17+
- Add kyverno alert log.
1118
- Update module documentation.
1219

1320
## [0.2.0] - 2024-10-17

Makefile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
TERRAFORM_DOCS_VERSION ?= 0.20.0
2+
13
.PHONY: lint tfscan generate-docs
24

35
lint:
@@ -10,4 +12,4 @@ generate-docs: lint
1012
docker run --rm -u $$(id -u) \
1113
--volume "$(PWD):/terraform-docs" \
1214
-w /terraform-docs \
13-
quay.io/terraform-docs/terraform-docs:0.16.0 markdown table --config .terraform-docs.yml --output-file README.md --output-mode inject .
15+
quay.io/terraform-docs/terraform-docs:$(TERRAFORM_DOCS_VERSION) markdown table --config .terraform-docs.yml --output-file README.md --output-mode inject .

README.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,16 @@ This module creates a set of monitoring alerts for Google Cloud Platform service
55
Supported services:
66

77
- Cloud SQL
8+
89
- CPU usage
910
- Storage usage
1011
- Memory usage
1112

13+
- Kyverno
14+
15+
- Error logs for admission-controller, background-controller, cleanup-controller, reports-controller
16+
- Metric threshold (optional)
17+
1218
<!-- BEGIN_TF_DOCS -->
1319
## Providers
1420

@@ -27,10 +33,10 @@ Supported services:
2733

2834
| Name | Description | Type | Default | Required |
2935
|------|-------------|------|---------|:--------:|
30-
| <a name="input_auto_close"></a> [auto\_close](#input\_auto\_close) | n/a | `string` | `"86400s"` | no |
31-
| <a name="input_cloud_sql"></a> [cloud\_sql](#input\_cloud\_sql) | n/a | <pre>object({<br> project = optional(string, null)<br> auto_close = optional(string, null)<br> notification_channels = optional(list(string), [])<br> instances = optional(map(object({<br> cpu_utilization = optional(list(object({<br> severity = optional(string, "WARNING"),<br> threshold = optional(number, 0.90)<br> alignment_period = optional(string, "120s")<br> duration = optional(string, "300s")<br> })), [<br> {<br> threshold = 0.85,<br> duration = "1200s",<br> },<br> {<br> severity = "CRITICAL",<br> threshold = 1,<br> duration = "300s",<br> alignment_period = "60s",<br> }<br> ])<br> memory_utilization = optional(list(object({<br> severity = optional(string, "WARNING"),<br> threshold = optional(number, 0.90)<br> alignment_period = optional(string, "300s")<br> duration = optional(string, "300s")<br> })), [<br> {<br> severity = "WARNING",<br> },<br> {<br> severity = "CRITICAL",<br> threshold = 0.95,<br> }<br> ])<br> disk_utilization = optional(list(object({<br> severity = optional(string, "WARNING"),<br> threshold = optional(number, 0.85)<br> alignment_period = optional(string, "300s")<br> duration = optional(string, "600s")<br> })), [<br> {<br> severity = "WARNING",<br> },<br> {<br> severity = "CRITICAL",<br> threshold = 0.95, <br> }<br> ])<br> })), {})<br> })</pre> | n/a | yes |
32-
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | n/a | `list(string)` | `[]` | no |
33-
| <a name="input_project"></a> [project](#input\_project) | n/a | `string` | `null` | no |
36+
| <a name="input_cloud_sql"></a> [cloud\_sql](#input\_cloud\_sql) | Configuration for Cloud SQL monitoring alerts. Supports customization of project, auto-close timing, notification channels, and per-instance alert thresholds for CPU, memory, and disk utilization. | <pre>object({<br/> project_id = optional(string, null)<br/> auto_close = optional(string, "86400s") # default 24h<br/> notification_channels = optional(list(string), [])<br/> instances = optional(map(object({<br/> cpu_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.90)<br/> alignment_period = optional(string, "120s")<br/> duration = optional(string, "300s")<br/> })), [<br/> {<br/> threshold = 0.85,<br/> duration = "1200s",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 1,<br/> duration = "300s",<br/> alignment_period = "60s",<br/> }<br/> ])<br/> memory_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.90)<br/> alignment_period = optional(string, "300s")<br/> duration = optional(string, "300s")<br/> })), [<br/> {<br/> severity = "WARNING",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 0.95,<br/> }<br/> ])<br/> disk_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.85)<br/> alignment_period = optional(string, "300s")<br/> duration = optional(string, "600s")<br/> })), [<br/> {<br/> severity = "WARNING",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 0.95,<br/> }<br/> ])<br/> })), {})<br/> })</pre> | n/a | yes |
37+
| <a name="input_kyverno"></a> [kyverno](#input\_kyverno) | Configuration for Kyverno monitoring alerts. Allows customization of cluster name, project, notification channels, alert documentation, metric thresholds, auto-close timing, enablement, extra filters, and namespace. | <pre>object({<br/> cluster_name = string<br/> project_id = optional(string, null)<br/> notification_channels = optional(list(string), [])<br/> alert_documentation = optional(string, null)<br/> use_metric_threshold = optional(bool, true)<br/> metric_threshold_count = optional(number, 2)<br/> metric_lookback_minutes = optional(number, 1)<br/> auto_close_seconds = optional(number, 3600)<br/> enabled = optional(bool, true)<br/> filter_extra = optional(string, "")<br/> namespace = optional(string, "kyverno")<br/> })</pre> | n/a | yes |
38+
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | List of notification channel IDs to notify when an alert is triggered | `list(string)` | `[]` | no |
39+
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | The Google Cloud project ID where logging exclusions will be created | `string` | n/a | yes |
3440

3541
## Outputs
3642

@@ -44,13 +50,15 @@ Supported services:
4450

4551
| Name | Type |
4652
|------|------|
53+
| [google_logging_metric.kyverno_error_metric](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/logging_metric) | resource |
4754
| [google_monitoring_alert_policy.cloud_sql_cpu_utilization](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
4855
| [google_monitoring_alert_policy.cloud_sql_disk_utilization](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
4956
| [google_monitoring_alert_policy.cloud_sql_memory_utilization](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
57+
| [google_monitoring_alert_policy.kyverno_logmatch_alert](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
58+
| [google_monitoring_alert_policy.kyverno_metric_threshold_alert](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
5059

5160
## Modules
5261

5362
No modules.
5463

55-
5664
<!-- END_TF_DOCS -->

cloud-sql.tf

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,11 @@
33
# ----------------------
44
locals {
55
# Use the cloud_sql project if specified, otherwise use the project.
6-
cloud_sql_project = var.cloud_sql.project != null ? var.cloud_sql.project : var.project
6+
cloud_sql_project = var.cloud_sql.project_id != null ? var.cloud_sql.project_id : var.project_id
77

88
# Use the cloud_sql notification channels for if not specified in the configuration.
99
cloud_sql_notification_channels = length(var.cloud_sql.notification_channels) > 0 ? var.cloud_sql.notification_channels : var.notification_channels
1010

11-
# Use the cloud_sql auto_close if specified, otherwise use the auto_close.
12-
cloud_sql_auto_close = var.cloud_sql.auto_close != null ? var.cloud_sql.auto_close : var.auto_close
13-
1411
cloud_sql_cpu_utilization = {
1512
for item in flatten(
1613
[
@@ -22,7 +19,7 @@ locals {
2219
},
2320
cpu_utilization
2421
)
25-
]
22+
]
2623
]
2724
) : "${item.instance}--${item.severity}--${item.threshold}" => item
2825
}
@@ -38,10 +35,10 @@ locals {
3835
},
3936
memory_utilization
4037
)
41-
]
38+
]
4239
]
4340
) : "${item.instance}--${item.severity}--${item.threshold}" => item
44-
}
41+
}
4542

4643
cloud_sql_disk_utilization = {
4744
for item in flatten(
@@ -54,10 +51,10 @@ locals {
5451
},
5552
disk_utilization
5653
)
57-
]
54+
]
5855
]
5956
) : "${item.instance}--${item.severity}--${item.threshold}" => item
60-
}
57+
}
6158
}
6259

6360
# ----------------------
@@ -67,7 +64,7 @@ resource "google_monitoring_alert_policy" "cloud_sql_cpu_utilization" {
6764
for_each = local.cloud_sql_cpu_utilization
6865

6966
display_name = "${local.cloud_sql_project} ${each.value.instance} - CPU utilization ${each.value.severity} ${each.value.threshold * 100}%"
70-
combiner = "OR"
67+
combiner = "OR"
7168
severity = each.value.severity
7269

7370
conditions {
@@ -87,7 +84,7 @@ resource "google_monitoring_alert_policy" "cloud_sql_cpu_utilization" {
8784
display_name = "${local.cloud_sql_project} ${each.value.instance} - CPU utilization ${each.value.severity} ${each.value.threshold * 100}%"
8885
}
8986
alert_strategy {
90-
auto_close = local.cloud_sql_auto_close
87+
auto_close = var.cloud_sql.auto_close
9188
}
9289
notification_channels = local.cloud_sql_notification_channels
9390
}
@@ -117,7 +114,7 @@ resource "google_monitoring_alert_policy" "cloud_sql_memory_utilization" {
117114
}
118115

119116
alert_strategy {
120-
auto_close = local.cloud_sql_auto_close
117+
auto_close = var.cloud_sql.auto_close
121118
}
122119

123120
notification_channels = local.cloud_sql_notification_channels
@@ -149,7 +146,7 @@ resource "google_monitoring_alert_policy" "cloud_sql_disk_utilization" {
149146
}
150147

151148
alert_strategy {
152-
auto_close = local.cloud_sql_auto_close
149+
auto_close = var.cloud_sql.auto_close
153150
}
154151
notification_channels = local.cloud_sql_notification_channels
155152
}

examples/main.tf

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@
44

55
locals {
66
# Enable all Cdoud SQL monitorings on selected instances, eg.
7-
cloud_sql = {
8-
instances = {
9-
(google_sql_database_instance.master.name) = {}
7+
cloud_sql = {
8+
instances = {
9+
(google_sql_database_instance.master.name) = {}
1010
(google_sql_database_instance.stage.name) = {}
11-
}
12-
}
11+
}
12+
}
1313

1414
# Use custom Cloud SQL cpu monitoring on google_sql_database_instance.master.name
1515
# Use all default Cloud SQL monitoring on google_sql_database_instance.stage.name
@@ -35,7 +35,7 @@ locals {
3535
# cloud_sql = {
3636
# instances = {
3737
# (google_sql_database_instance.master.stage) = { cpu_utilization = [] }
38-
# (google_sql_database_instance.master.prod) = {}
38+
# (google_sql_database_instance.master.prod) = {}
3939
# }
4040
# }
4141

@@ -46,6 +46,16 @@ module "example" {
4646
version = ">= 0.1.0"
4747

4848
notification_channels = var.notification_channels
49-
project = var.project
50-
cloud_sql = local.cloud_sql
49+
project = var.project
50+
cloud_sql = local.cloud_sql
51+
kyverno_log_alert_settings = {
52+
cluster_name = "test-cluster"
53+
enabled = true
54+
use_metric_threshold = true
55+
metric_threshold_count = 5
56+
notification_channels = []
57+
# Optional filter for log entries, exclude known non-actionable messages
58+
# e.g., "-textPayload:\"stale GroupVersion discovery: metrics.k8s.io/v1beta1\""
59+
filter_extra = "-textPayload:\"stale GroupVersion discovery: metrics.k8s.io/v1beta1\""
60+
}
5161
}

examples/variables.tf

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,27 @@
11

22
variable "project" {
33
type = string
4-
default = ""
4+
default = "test-project"
55
}
66

77
variable "notification_channels" {
88
type = list(string)
99
default = []
1010
}
11+
12+
variable "kyverno" {
13+
description = "Configurazione completa del monitoraggio Kyverno"
14+
type = object({
15+
cluster_name = string
16+
project_id = optional(string, null)
17+
notification_channels = optional(list(string), [])
18+
alert_documentation = optional(string, "Kyverno controllers produced ERROR logs in namespace kyverno.")
19+
use_metric_threshold = optional(bool, true)
20+
metric_threshold_count = optional(number, 2)
21+
metric_lookback_minutes = optional(number, 1)
22+
auto_close_seconds = optional(number, 3600)
23+
enabled = optional(bool, true)
24+
filter_extra = optional(string, "")
25+
namespace = optional(string, "kyverno")
26+
})
27+
}

kyverno_log_alert.tf

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
locals {
2+
kyverno_project_id = var.kyverno.project_id != null ? var.kyverno.project_id : var.project_id
3+
alert_documentation = var.kyverno.alert_documentation != null ? var.kyverno.alert_documentation : "Kyverno controllers produced ERROR logs in namespace ${var.kyverno.namespace}."
4+
kyverno_notification_channels = length(var.kyverno.notification_channels) > 0 ? var.kyverno.notification_channels : var.notification_channels
5+
kyverno_log_filter = <<-EOT
6+
resource.type="k8s_container"
7+
resource.labels.project_id="${local.kyverno_project_id}"
8+
resource.labels.cluster_name="${var.kyverno.cluster_name}"
9+
resource.labels.namespace_name="${var.kyverno.namespace}"
10+
severity>=ERROR
11+
(
12+
labels."k8s-pod/app_kubernetes_io/component"=~"(admission-controller|background-controller|cleanup-controller|reports-controller)"
13+
OR resource.labels.pod_name=~"kyverno-(admission|background|cleanup|reports)-controller-.*"
14+
)
15+
${trimspace(var.kyverno.filter_extra)}
16+
EOT
17+
kyverno_metric_name = lower(replace(
18+
"kyverno_error_logs_count_${var.kyverno.cluster_name}_${var.kyverno.namespace}",
19+
"/[^a-zA-Z0-9_]/", "_"
20+
))
21+
}
22+
23+
resource "google_monitoring_alert_policy" "kyverno_logmatch_alert" {
24+
count = (
25+
var.kyverno.enabled
26+
&& !var.kyverno.use_metric_threshold
27+
&& trimspace(var.kyverno.cluster_name) != ""
28+
) ? 1 : 0
29+
30+
display_name = "Kyverno controllers ERROR logs (namespace=${var.kyverno.namespace})"
31+
combiner = "OR"
32+
enabled = var.kyverno.enabled
33+
34+
conditions {
35+
display_name = "Kyverno ERROR in logs"
36+
condition_matched_log {
37+
filter = local.kyverno_log_filter
38+
}
39+
}
40+
41+
documentation {
42+
content = local.alert_documentation
43+
mime_type = "text/markdown"
44+
}
45+
46+
notification_channels = var.kyverno.notification_channels
47+
48+
alert_strategy {
49+
auto_close = "${var.kyverno.auto_close_seconds}s"
50+
}
51+
}
52+
53+
resource "google_logging_metric" "kyverno_error_metric" {
54+
count = (
55+
var.kyverno.enabled
56+
&& var.kyverno.use_metric_threshold
57+
&& trimspace(var.kyverno.cluster_name) != ""
58+
) ? 1 : 0
59+
60+
name = local.kyverno_metric_name
61+
description = "Count of ERROR+ logs from Kyverno controllers in namespace ${var.kyverno.namespace}"
62+
filter = local.kyverno_log_filter
63+
64+
metric_descriptor {
65+
metric_kind = "DELTA"
66+
value_type = "INT64"
67+
unit = "1"
68+
}
69+
}
70+
71+
resource "google_monitoring_alert_policy" "kyverno_metric_threshold_alert" {
72+
count = (
73+
var.kyverno.enabled
74+
&& var.kyverno.use_metric_threshold
75+
&& trimspace(var.kyverno.cluster_name) != ""
76+
) ? 1 : 0
77+
78+
display_name = "Kyverno controllers ERROR logs rate (namespace=${var.kyverno.namespace})"
79+
combiner = "OR"
80+
enabled = var.kyverno.enabled
81+
82+
conditions {
83+
display_name = "Kyverno ERROR logs >= ${var.kyverno.metric_threshold_count} in ${var.kyverno.metric_lookback_minutes}m"
84+
condition_threshold {
85+
filter = format(
86+
"metric.type=\"logging.googleapis.com/user/%s\" resource.type=\"global\"",
87+
google_logging_metric.kyverno_error_metric[0].name
88+
)
89+
90+
comparison = "COMPARISON_GT"
91+
threshold_value = var.kyverno.metric_threshold_count - 0.000001
92+
duration = "0s"
93+
94+
aggregations {
95+
alignment_period = "${var.kyverno.metric_lookback_minutes * 60}s"
96+
per_series_aligner = "ALIGN_DELTA"
97+
cross_series_reducer = "REDUCE_SUM"
98+
group_by_fields = []
99+
}
100+
101+
trigger {
102+
count = 1
103+
}
104+
}
105+
}
106+
107+
documentation {
108+
content = local.alert_documentation
109+
mime_type = "text/markdown"
110+
}
111+
112+
notification_channels = local.kyverno_notification_channels
113+
114+
alert_strategy {
115+
auto_close = "${var.kyverno.auto_close_seconds}s"
116+
}
117+
}

main.tf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

0 commit comments

Comments
 (0)