Skip to content

Commit 196acf6

Browse files
committed
feat(kyverno): remove generic error pattern to reduce false positives
1 parent bca5499 commit 196acf6

File tree

4 files changed

+73
-107
lines changed

4 files changed

+73
-107
lines changed

CHANGELOG.md

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,27 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
88

99
## [Unreleased]
1010

11-
## [0.13.1] - 2026-02-05
11+
## [0.14.0] - 2026-02-05
1212

13-
[Compare with previous version](https://github.com/sparkfabrik/terraform-google-services-monitoring/compare/0.13.0...0.13.1)
13+
[Compare with previous version](https://github.com/sparkfabrik/terraform-google-services-monitoring/compare/0.13.0...0.14.0)
14+
15+
### Breaking change
16+
17+
- **Kyverno log matching now uses `jsonPayload.message` instead of `jsonPayload.error`**. This provides more precise control over which log messages trigger alerts and enables proper exclusion of specific messages.
18+
- Error-detail patterns like `"is forbidden"`, `"context deadline exceeded"`, `"timeout"` have been removed as they appear in the `error` field, not the `message` field.
19+
- Patterns are now specific (e.g., `"failed to update lock"`) instead of generic (e.g., `"failed to update"`) to avoid overlap when excluding.
20+
- To migrate: review your `error_patterns_exclude` configuration and update pattern names if needed.
1421

1522
### Changed
1623

17-
- Extend `error_patterns_exclude` behavior: excluded patterns now also generate `NOT jsonPayload.message=~"pattern"` conditions, allowing exclusion of logs where the pattern appears in the message field (not just the error field).
24+
- Add `severity=ERROR` filter condition to ensure only error-level logs trigger alerts.
25+
- Update Kyverno default patterns to message-based matching:
26+
- `"failed to list resources"`, `"failed to watch resource"`, `"failed to start watcher"`
27+
- `"failed to sync"`, `"failed to run warmup"`, `"failed to load certificate"`
28+
- `"failed to update lock"`, `"failed to update lease"`, `"failed to process request"`
29+
- `"failed to check permissions"`, `"failed to scan resource"`, `"failed to fetch data"`
30+
- `"failed to substitute variables"`, `"failed calling webhook"`
31+
- `"leader election lost"`, `"dropping request"`, `"panic"`
1832

1933
## [0.13.0] - 2026-02-04
2034

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ Supported services:
5656
| <a name="input_cert_manager"></a> [cert\_manager](#input\_cert\_manager) | Configuration for cert-manager missing issuer log alert. Allows customization of project, cluster, namespace, notification channels, alert documentation, enablement, extra filters, auto-close timing, and notification rate limiting. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> namespace = optional(string, "cert-manager")<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> filter_extra = optional(string, "")<br/> })</pre> | `{}` | no |
5757
| <a name="input_cloud_sql"></a> [cloud\_sql](#input\_cloud\_sql) | Configuration for Cloud SQL monitoring alerts. Supports customization of project, auto-close timing, notification channels, and per-instance alert thresholds for CPU, memory, and disk utilization. | <pre>object({<br/> enabled = optional(bool, true)<br/> project_id = optional(string, null)<br/> auto_close = optional(string, "86400s") # default 24h<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> instances = optional(map(object({<br/> cpu_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.90)<br/> alignment_period = optional(string, "120s")<br/> duration = optional(string, "300s")<br/> })), [<br/> {<br/> threshold = 0.85,<br/> duration = "1200s",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 1,<br/> duration = "300s",<br/> alignment_period = "60s",<br/> }<br/> ])<br/> memory_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.90)<br/> alignment_period = optional(string, "300s")<br/> duration = optional(string, "300s")<br/> })), [<br/> {<br/> severity = "WARNING",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 0.95,<br/> }<br/> ])<br/> disk_utilization = optional(list(object({<br/> severity = optional(string, "WARNING"),<br/> threshold = optional(number, 0.85)<br/> alignment_period = optional(string, "300s")<br/> duration = optional(string, "600s")<br/> })), [<br/> {<br/> severity = "WARNING",<br/> },<br/> {<br/> severity = "CRITICAL",<br/> threshold = 0.95,<br/> }<br/> ])<br/> })), {})<br/> })</pre> | `{}` | no |
5858
| <a name="input_konnectivity_agent"></a> [konnectivity\_agent](#input\_konnectivity\_agent) | Configuration for Konnectivity agent deployment replica alert in GKE. Triggers when there are no available replicas. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> namespace = optional(string, "kube-system")<br/> deployment_name = optional(string, "konnectivity-agent")<br/> duration_seconds = optional(number, 60)<br/> auto_close_seconds = optional(number, 3600)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> notification_prompts = optional(list(string), null)<br/> })</pre> | `{}` | no |
59-
| <a name="input_kyverno"></a> [kyverno](#input\_kyverno) | Configuration for Kyverno monitoring alerts. Allows customization of cluster name, project, notification channels, alert documentation, metric thresholds, auto-close timing, enablement, error pattern inclusions/exclusions for jsonPayload.error matching, and namespace. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> # Rate limit for notifications, e.g. "300s" for 5 minutes, used only for log match alerts<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> namespace = optional(string, "kyverno")<br/> # List of error patterns to exclude from the default set.<br/> # Default patterns available for exclusion:<br/> # "internal error", "failed calling webhook", "timeout", "client-side throttling",<br/> # "failed to run warmup", "schema not found", "failed to list resources",<br/> # "failed to watch resource", "context deadline exceeded", "is forbidden",<br/> # "cannot list resource", "cannot watch resource", "RBAC.*denied",<br/> # "failed to start watcher", "leader election lost", "unable to update .*WebhookConfiguration",<br/> # "failed to sync", "dropping request", "failed to load certificate",<br/> # "failed to update lock", "the object has been modified", "no matches for kind",<br/> # "the server could not find the requested resource", "Too Many Requests", "x509",<br/> # "is invalid:", "connection refused", "no agent available", "fatal error", "panic"<br/> error_patterns_exclude = optional(list(string), [])<br/> # List of additional regex error patterns to include (added to default set)<br/> # e.g. ["my custom.*error", "failed to connect.*database"]<br/> error_patterns_include = optional(list(string), [])<br/> })</pre> | `{}` | no |
59+
| <a name="input_kyverno"></a> [kyverno](#input\_kyverno) | Configuration for Kyverno monitoring alerts. Allows customization of cluster name, project, notification channels, alert documentation, metric thresholds, auto-close timing, enablement, message pattern inclusions/exclusions for jsonPayload.message matching, and namespace. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = optional(string, null)<br/> project_id = optional(string, null)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> # Rate limit for notifications, e.g. "300s" for 5 minutes, used only for log match alerts<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> namespace = optional(string, "kyverno")<br/> # List of message patterns to exclude from the default set (matches against jsonPayload.message).<br/> # Default patterns available for exclusion:<br/> # "failed to list resources", "failed to watch resource", "failed to start watcher",<br/> # "failed to sync", "failed to run warmup", "failed to load certificate",<br/> # "failed to update lock", "failed to update lease", "failed to process request",<br/> # "failed to check permissions", "failed to scan resource", "failed to fetch data",<br/> # "failed to substitute variables", "failed calling webhook",<br/> # "leader election lost", "dropping request", "panic"<br/> error_patterns_exclude = optional(list(string), [])<br/> # List of additional regex message patterns to include (added to default set)<br/> # e.g. ["failed to update lease", "failed to connect.*"]<br/> error_patterns_include = optional(list(string), [])<br/> })</pre> | `{}` | no |
6060
| <a name="input_litellm"></a> [litellm](#input\_litellm) | Configuration for LiteLLM monitoring alerts. Supports uptime checks for HTTP endpoints and container-level alerts (pod restarts) in GKE. Each app is identified by its name (map key). | <pre>object({<br/> enabled = optional(bool, false)<br/> project_id = optional(string, null)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> cluster_name = optional(string, null)<br/><br/> apps = optional(map(object({<br/> uptime_check = optional(object({<br/> enabled = optional(bool, true)<br/> host = string<br/> path = optional(string, "/health/readiness")<br/> }), null)<br/><br/> container_check = optional(object({<br/> enabled = optional(bool, true)<br/> namespace = string<br/> pod_restart = optional(object({<br/> threshold = optional(number, 0)<br/> alignment_period = optional(number, 60)<br/> duration = optional(number, 180)<br/> auto_close_seconds = optional(number, 3600)<br/> notification_prompts = optional(list(string), null)<br/> }), {})<br/> }), null)<br/> })), {})<br/> })</pre> | `{}` | no |
6161
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | List of notification channel IDs to notify when an alert is triggered | `list(string)` | `[]` | no |
6262
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | The Google Cloud project ID where logging exclusions will be created | `string` | n/a | yes |

kyverno.tf

Lines changed: 24 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -5,76 +5,57 @@ locals {
55

66
kyverno_cluster_name = var.kyverno.cluster_name != null ? trimspace(var.kyverno.cluster_name) : ""
77

8-
# Default error patterns for Kyverno log matching
9-
kyverno_default_error_patterns = [
10-
"internal error",
11-
"failed calling webhook",
12-
"timeout",
13-
"client-side throttling",
14-
"failed to run warmup",
15-
"schema not found",
8+
# Default message patterns for Kyverno log matching (matches against jsonPayload.message)
9+
kyverno_default_message_patterns = [
1610
"failed to list resources",
1711
"failed to watch resource",
18-
"context deadline exceeded",
19-
"is forbidden",
20-
"cannot list resource",
21-
"cannot watch resource",
22-
"RBAC.*denied",
2312
"failed to start watcher",
24-
"leader election lost",
25-
"unable to update .*WebhookConfiguration",
2613
"failed to sync",
27-
"dropping request",
14+
"failed to run warmup",
2815
"failed to load certificate",
2916
"failed to update lock",
30-
"the object has been modified",
31-
"no matches for kind",
32-
"the server could not find the requested resource",
33-
"Too Many Requests",
34-
"x509",
35-
"is invalid:",
36-
"connection refused",
37-
"no agent available",
38-
"fatal error",
17+
"failed to update lease",
18+
"failed to process request",
19+
"failed to check permissions",
20+
"failed to scan resource",
21+
"failed to fetch data",
22+
"failed to substitute variables",
23+
"failed calling webhook",
24+
"leader election lost",
25+
"dropping request",
3926
"panic",
4027
]
4128

4229
# Combine default patterns with included patterns, then filter out excluded ones
43-
kyverno_all_error_patterns = distinct(concat(
44-
local.kyverno_default_error_patterns,
30+
kyverno_all_message_patterns = distinct(concat(
31+
local.kyverno_default_message_patterns,
4532
var.kyverno.error_patterns_include
4633
))
4734

48-
kyverno_active_error_patterns = [
49-
for pattern in local.kyverno_all_error_patterns :
35+
kyverno_active_message_patterns = [
36+
for pattern in local.kyverno_all_message_patterns :
5037
pattern if !contains(var.kyverno.error_patterns_exclude, pattern)
5138
]
5239

53-
# Build the error patterns filter string
54-
kyverno_error_patterns_filter = length(local.kyverno_active_error_patterns) > 0 ? join("\n OR ", [
55-
for pattern in local.kyverno_active_error_patterns :
56-
"jsonPayload.error=~\"(?i)${pattern}\""
57-
]) : ""
58-
59-
# Build NOT conditions for excluded patterns on jsonPayload.message
60-
kyverno_message_exclusions = length(var.kyverno.error_patterns_exclude) > 0 ? join("\n ", [
61-
for pattern in var.kyverno.error_patterns_exclude :
62-
"AND NOT jsonPayload.message=~\"(?i)${pattern}\""
40+
# Build the message patterns filter string
41+
kyverno_message_patterns_filter = length(local.kyverno_active_message_patterns) > 0 ? join("\n OR ", [
42+
for pattern in local.kyverno_active_message_patterns :
43+
"jsonPayload.message=~\"(?i)${pattern}\""
6344
]) : ""
6445

65-
kyverno_log_filter = local.kyverno_cluster_name != "" && length(local.kyverno_active_error_patterns) > 0 ? (<<-EOT
46+
kyverno_log_filter = local.kyverno_cluster_name != "" && length(local.kyverno_active_message_patterns) > 0 ? (<<-EOT
6647
resource.type="k8s_container"
6748
AND resource.labels.project_id="${local.kyverno_project_id}"
6849
AND resource.labels.cluster_name="${local.kyverno_cluster_name}"
6950
AND resource.labels.namespace_name="${var.kyverno.namespace}"
51+
AND severity=ERROR
7052
AND (
7153
labels."k8s-pod/app_kubernetes_io/component"=~"(admission-controller|background-controller|cleanup-controller|reports-controller)"
7254
OR resource.labels.pod_name=~"kyverno-(admission|background|cleanup|reports)-controller-.*"
7355
)
7456
AND (
75-
${local.kyverno_error_patterns_filter}
57+
${local.kyverno_message_patterns_filter}
7658
)
77-
${local.kyverno_message_exclusions}
7859
EOT
7960
) : ""
8061
}
@@ -83,7 +64,7 @@ resource "google_monitoring_alert_policy" "kyverno_logmatch_alert" {
8364
count = (
8465
var.kyverno.enabled
8566
&& local.kyverno_cluster_name != ""
86-
&& length(local.kyverno_active_error_patterns) > 0
67+
&& length(local.kyverno_active_message_patterns) > 0
8768
) ? 1 : 0
8869

8970
display_name = "Kyverno controllers ERROR logs (namespace=${var.kyverno.namespace})"

0 commit comments

Comments
 (0)