Skip to content

Commit 393d173

Browse files
committed
feat(kyverno): enhance error pattern handling with inclusion/exclusion options
1 parent 53850b9 commit 393d173

File tree

7 files changed

+61
-16
lines changed

7 files changed

+61
-16
lines changed

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,9 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
2626
### Breaking change
2727

2828
- The `filter_extra` variable has been removed and replaced with `error_patterns_include` and `error_patterns_exclude`. To migrate:
29-
- If you were using `filter_extra` to add custom error patterns, use `error_patterns_include` instead.
29+
- If you were using `filter_extra` to add custom error patterns for `jsonPayload.error` matching, use `error_patterns_include` instead.
3030
- If you need to exclude specific default error patterns, use `error_patterns_exclude`.
31+
- **Note:** The new options only support error pattern matching against `jsonPayload.error`. If you were using `filter_extra` for arbitrary log filter conditions (e.g., negative filters like `-textPayload:"..."`), this functionality is no longer available.
3132
- See [examples/main.tf](examples/main.tf) for usage examples.
3233

3334
## [0.12.0] - 2026-01-28

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ Supported services:
5656
| <a name="input_cert_manager"></a> [cert\_manager](#input\_cert\_manager) | Configuration for cert-manager missing issuer log alert. Allows customization of project, cluster, namespace, notification channels, alert documentation, enablement, extra filters, auto-close timing, and notification rate limiting. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> namespace = optional(string, "cert-manager")<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> filter_extra = optional(string, "")<br/> })</pre> | `{}` | no |
5757
| <a name="input_cloud_sql"></a> [cloud\_sql](#input\_cloud\_sql) | Configuration for Cloud SQL monitoring alerts. Supports customization of project, auto-close timing, notification channels, and per-instance alert thresholds for CPU, memory, and disk utilization. | <pre>object({<br/> enabled = optional(bool, true)<br/> project_id = optional(string, null)<br/> auto_close = optional(string, "86400s") # default 24h<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> instances = optional(map(object({<br/> cpu_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.90)<br/> alignment_period = optional(string, "120s")<br/> duration = optional(string, "300s")<br/> })), [<br/> {<br/> threshold = 0.85,<br/> duration = "1200s",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 1,<br/> duration = "300s",<br/> alignment_period = "60s",<br/> }<br/> ])<br/> memory_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.90)<br/> alignment_period = optional(string, "300s")<br/> duration = optional(string, "300s")<br/> })), [<br/> {<br/> severity = "WARNING",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 0.95,<br/> }<br/> ])<br/> disk_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.85)<br/> alignment_period = optional(string, "300s")<br/> duration = optional(string, "600s")<br/> })), [<br/> {<br/> severity = "WARNING",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 0.95,<br/> }<br/> ])<br/> })), {})<br/> })</pre> | `{}` | no |
5858
| <a name="input_konnectivity_agent"></a> [konnectivity\_agent](#input\_konnectivity\_agent) | Configuration for Konnectivity agent deployment replica alert in GKE. Triggers when there are no available replicas. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> namespace = optional(string, "kube-system")<br/> deployment_name = optional(string, "konnectivity-agent")<br/> duration_seconds = optional(number, 60)<br/> auto_close_seconds = optional(number, 3600)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> notification_prompts = optional(list(string), null)<br/> })</pre> | `{}` | no |
59-
| <a name="input_kyverno"></a> [kyverno](#input\_kyverno) | Configuration for Kyverno monitoring alerts. Allows customization of cluster name, project, notification channels, alert documentation, metric thresholds, auto-close timing, enablement, error pattern inclusions/exclusions, and namespace. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> # Rate limit for notifications, e.g. "300s" for 5 minutes, used only for log match alerts<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> namespace = optional(string, "kyverno")<br/> # List of error patterns to exclude from the default set.<br/> # Default patterns available for exclusion:<br/> # "internal error", "failed calling webhook", "timeout", "client-side throttling",<br/> # "failed to run warmup", "schema not found", "failed to list resources",<br/> # "failed to watch resource", "context deadline exceeded", "is forbidden",<br/> # "cannot list resource", "cannot watch resource", "RBAC.*denied",<br/> # "failed to start watcher", "leader election lost", "unable to update .*WebhookConfiguration",<br/> # "failed to sync", "dropping request", "failed to load certificate",<br/> # "failed to update lock", "the object has been modified", "no matches for kind",<br/> # "the server could not find the requested resource", "Too Many Requests", "x509",<br/> # "is invalid:", "connection refused", "no agent available", "fatal error", "panic"<br/> error_patterns_exclude = optional(list(string), [])<br/> # List of additional error patterns to include (added to default set)<br/> # e.g. ["my custom error", "another pattern"]<br/> error_patterns_include = optional(list(string), [])<br/> })</pre> | `{}` | no |
59+
| <a name="input_kyverno"></a> [kyverno](#input\_kyverno) | Configuration for Kyverno monitoring alerts. Allows customization of cluster name, project, notification channels, alert documentation, metric thresholds, auto-close timing, enablement, error pattern inclusions/exclusions for jsonPayload.error matching, and namespace. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> # Rate limit for notifications, e.g. "300s" for 5 minutes, used only for log match alerts<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> namespace = optional(string, "kyverno")<br/> # List of error patterns to exclude from the default set.<br/> # Default patterns available for exclusion:<br/> # "internal error", "failed calling webhook", "timeout", "client-side throttling",<br/> # "failed to run warmup", "schema not found", "failed to list resources",<br/> # "failed to watch resource", "context deadline exceeded", "is forbidden",<br/> # "cannot list resource", "cannot watch resource", "RBAC.*denied",<br/> # "failed to start watcher", "leader election lost", "unable to update .*WebhookConfiguration",<br/> # "failed to sync", "dropping request", "failed to load certificate",<br/> # "failed to update lock", "the object has been modified", "no matches for kind",<br/> # "the server could not find the requested resource", "Too Many Requests", "x509",<br/> # "is invalid:", "connection refused", "no agent available", "fatal error", "panic"<br/> error_patterns_exclude = optional(list(string), [])<br/> # List of additional regex error patterns to include (added to default set)<br/> # e.g. ["my custom.*error", "failed to connect.*database"]<br/> error_patterns_include = optional(list(string), [])<br/> })</pre> | `{}` | no |
6060
| <a name="input_litellm"></a> [litellm](#input\_litellm) | Configuration for LiteLLM monitoring alerts. Supports uptime checks for HTTP endpoints and container-level alerts (pod restarts) in GKE. Each app is identified by its name (map key). | <pre>object({<br/> enabled = optional(bool, false)<br/> project_id = optional(string, null)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> cluster_name = optional(string, null)<br/><br/> apps = optional(map(object({<br/> uptime_check = optional(object({<br/> enabled = optional(bool, true)<br/> host = string<br/> path = optional(string, "/health/readiness")<br/> }), null)<br/><br/> container_check = optional(object({<br/> enabled = optional(bool, true)<br/> namespace = string<br/> pod_restart = optional(object({<br/> threshold = optional(number, 0)<br/> alignment_period = optional(number, 60)<br/> duration = optional(number, 180)<br/> auto_close_seconds = optional(number, 3600)<br/> notification_prompts = optional(list(string), null)<br/> }), {})<br/> }), null)<br/> })), {})<br/> })</pre> | `{}` | no |
6161
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | List of notification channel IDs to notify when an alert is triggered | `list(string)` | `[]` | no |
6262
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | The Google Cloud project ID where logging exclusions will be created | `string` | n/a | yes |

cert_manager.tf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ locals {
88
EOT
99
)
1010
cert_manager_notification_channels = var.cert_manager.notification_enabled ? (length(var.cert_manager.notification_channels) > 0 ? var.cert_manager.notification_channels : var.notification_channels) : []
11-
cert_manager_cluster_name = var.cert_manager.cluster_name != null ? trimspace(var.cert_manager.cluster_name) : ""
11+
cert_manager_cluster_name = var.cert_manager.cluster_name != null ? trimspace(var.cert_manager.cluster_name) : ""
1212

1313
cert_manager_log_filter = local.cert_manager_cluster_name != "" ? (<<-EOT
1414
(

examples/main.tf

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -50,15 +50,17 @@ module "example" {
5050
kyverno = {
5151
cluster_name = "test-cluster"
5252
notification_channels = []
53-
# Exclude specific error patterns from the default set
53+
# Exclude specific error patterns from the default set (only affects jsonPayload.error matching)
5454
error_patterns_exclude = [
5555
"failed to start watcher",
5656
"failed to list resources",
5757
]
58-
# Add custom error patterns to the default set
58+
# Add custom regex error patterns to the default set (matched against jsonPayload.error)
59+
# Note: These options only support error pattern matching. Arbitrary log filter conditions
60+
# (e.g., negative filters like -textPayload:"...") are not supported.
5961
# error_patterns_include = [
60-
# "my custom error",
61-
# "another pattern to match",
62+
# "my custom.*error",
63+
# "failed to connect.*database",
6264
# ]
6365
}
6466
cert_manager = {

kyverno.tf

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,10 @@ locals {
4040
]
4141

4242
# Combine default patterns with included patterns, then filter out excluded ones
43-
kyverno_all_error_patterns = concat(
43+
kyverno_all_error_patterns = distinct(concat(
4444
local.kyverno_default_error_patterns,
4545
var.kyverno.error_patterns_include
46-
)
46+
))
4747

4848
kyverno_active_error_patterns = [
4949
for pattern in local.kyverno_all_error_patterns :

modules/http_monitoring/main.tf

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
locals {
2-
suffix = var.uptime_monitoring_path != "/" ? var.uptime_monitoring_path : ""
3-
uptime_monitoring_display_name = var.uptime_monitoring_display_name != "" ? "${var.uptime_monitoring_display_name} - ${var.uptime_monitoring_host}${local.suffix}" : "${var.uptime_monitoring_host}${local.suffix}"
4-
alert_display_name = var.alert_display_name != "" ? var.alert_display_name : "Failure of uptime check for: ${local.uptime_monitoring_display_name}"
2+
suffix = var.uptime_monitoring_path != "/" ? var.uptime_monitoring_path : ""
3+
uptime_monitoring_display_name = var.uptime_monitoring_display_name != "" ? "${var.uptime_monitoring_display_name} - ${var.uptime_monitoring_host}${local.suffix}" : "${var.uptime_monitoring_host}${local.suffix}"
4+
alert_display_name = var.alert_display_name != "" ? var.alert_display_name : "Failure of uptime check for: ${local.uptime_monitoring_display_name}"
55
}
66

77
resource "google_monitoring_uptime_check_config" "https_uptime" {

variables.tf

Lines changed: 46 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ variable "cloud_sql" {
6969
}
7070

7171
variable "kyverno" {
72-
description = "Configuration for Kyverno monitoring alerts. Allows customization of cluster name, project, notification channels, alert documentation, metric thresholds, auto-close timing, enablement, error pattern inclusions/exclusions, and namespace."
72+
description = "Configuration for Kyverno monitoring alerts. Allows customization of cluster name, project, notification channels, alert documentation, metric thresholds, auto-close timing, enablement, error pattern inclusions/exclusions for jsonPayload.error matching, and namespace."
7373
default = {}
7474
type = object({
7575
enabled = optional(bool, true)
@@ -81,7 +81,7 @@ variable "kyverno" {
8181
logmatch_notification_rate_limit = optional(string, "300s")
8282
alert_documentation = optional(string, null)
8383
auto_close_seconds = optional(number, 3600)
84-
namespace = optional(string, "kyverno")
84+
namespace = optional(string, "kyverno")
8585
# List of error patterns to exclude from the default set.
8686
# Default patterns available for exclusion:
8787
# "internal error", "failed calling webhook", "timeout", "client-side throttling",
@@ -94,8 +94,8 @@ variable "kyverno" {
9494
# "the server could not find the requested resource", "Too Many Requests", "x509",
9595
# "is invalid:", "connection refused", "no agent available", "fatal error", "panic"
9696
error_patterns_exclude = optional(list(string), [])
97-
# List of additional error patterns to include (added to default set)
98-
# e.g. ["my custom error", "another pattern"]
97+
# List of additional regex error patterns to include (added to default set)
98+
# e.g. ["my custom.*error", "failed to connect.*database"]
9999
error_patterns_include = optional(list(string), [])
100100
})
101101

@@ -144,6 +144,48 @@ variable "kyverno" {
144144
])
145145
error_message = "error_patterns_exclude contains invalid pattern(s). Only default patterns can be excluded. Check the variable description for the list of valid patterns."
146146
}
147+
148+
validation {
149+
condition = (
150+
!var.kyverno.enabled ||
151+
length(setsubtract(
152+
toset(concat([
153+
"internal error",
154+
"failed calling webhook",
155+
"timeout",
156+
"client-side throttling",
157+
"failed to run warmup",
158+
"schema not found",
159+
"failed to list resources",
160+
"failed to watch resource",
161+
"context deadline exceeded",
162+
"is forbidden",
163+
"cannot list resource",
164+
"cannot watch resource",
165+
"RBAC.*denied",
166+
"failed to start watcher",
167+
"leader election lost",
168+
"unable to update .*WebhookConfiguration",
169+
"failed to sync",
170+
"dropping request",
171+
"failed to load certificate",
172+
"failed to update lock",
173+
"the object has been modified",
174+
"no matches for kind",
175+
"the server could not find the requested resource",
176+
"Too Many Requests",
177+
"x509",
178+
"is invalid:",
179+
"connection refused",
180+
"no agent available",
181+
"fatal error",
182+
"panic",
183+
], var.kyverno.error_patterns_include)),
184+
toset(var.kyverno.error_patterns_exclude)
185+
)) > 0
186+
)
187+
error_message = "The combination of error_patterns_exclude and error_patterns_include results in no active error patterns. At least one pattern must remain active, otherwise the alert will not be created."
188+
}
147189
}
148190

149191
variable "cert_manager" {

0 commit comments

Comments
 (0)