All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Adjust Kyverno log filter to reduce false positives from normal transient errors such as
i/o timeoutandfailed to acquire lease, including removal of the explicitfailed to acquire leasecondition. - Rename error pattern
list resources failedtofailed to list resourcesfor consistency with other error patterns.
- Add
error_patterns_excludeto Kyverno configuration to allow excluding specific error patterns from the default set. - Add
error_patterns_includeto Kyverno configuration to allow adding custom error patterns to the default set. - Add validation for
error_patterns_excludeto ensure only valid default patterns can be excluded.
- The
filter_extravariable has been removed and replaced witherror_patterns_includeanderror_patterns_exclude. To migrate:- If you were using
filter_extrato add custom error patterns forjsonPayload.errormatching, useerror_patterns_includeinstead. - If you need to exclude specific default error patterns, use
error_patterns_exclude. - Note: The new options only support error pattern matching against
jsonPayload.error. If you were usingfilter_extrafor arbitrary log filter conditions (e.g., negative filters like-textPayload:"..."), this functionality is no longer available. - See examples/main.tf for usage examples.
- If you were using
- refs platform/board#4071: remove dependencies from
terraform-sparkfabrik-gcp-http-monitoringterraform module.
- Update Kyverno log alert filter to use explicit AND/OR grouping for controller selectors and to match error patterns via
jsonPayload.error. - Add konnectivity agent replica alert with a PromQL-based condition that counts pods via
kubernetes_io:container_uptime. - Standardize alert filter/query style for consistency across configuration.
- Add
no agent availableto Kyverno log alert filter to capture control plane-to-node connectivity failures via Konnectivity (upstream Kubernetes); commonly seen on GKE (especially private nodes), but not GKE-specific.
- Add
notification_promptsparam for LiteLLM and Typesense
- Modify the default values of the pod restart alerts
durationandalignment_period
- refs platform/board#4051: add LiteLLM monitoring
- refs platform/board#4071: add SSL certificate expiration alert configuration
- refs platform/board#4052: add Typesense monitoring alerts and configuration for uptime checks and container checks
- refs platform/board#3935: Kyverno log alert filter updated with explicit error patterns.
- The previous
severity>=ERRORfilter for Kyverno log alerts has been removed and replaced with explicit text pattern matching. This significantly alters alert behavior, as alerts are now triggered based on specific error patterns rather than severity level. Please review and update your alert expectations accordingly.
- Rename tf file from
cloud-sql.tftocloud_sql.tf. - Rename tf file from
kyverno_log_alert.tftokyverno.tf. - Add cert-manager missing issuer alert log.
- Add kyverno alert log.
- Update module documentation.
- Increase default alert thresholds for Cloud SQL CPU, memory and disk utilization.
- Fixed Google provider minimum required version.
- Add support for Cloud SQL monitoring:
- CPU utilization
- Memory utilization
- Disk utilization