All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Adjust Kyverno log filter to reduce false positives from normal transient errors such as
i/o timeoutandfailed to acquire lease, including removal of the explicitfailed to acquire leasecondition.
- refs platform/board#4071: remove dependencies from
terraform-sparkfabrik-gcp-http-monitoringterraform module.
- Update Kyverno log alert filter to use explicit AND/OR grouping for controller selectors and to match error patterns via
jsonPayload.error. - Add konnectivity agent replica alert with a PromQL-based condition that counts pods via
kubernetes_io:container_uptime. - Standardize alert filter/query style for consistency across configuration.
- Add
no agent availableto Kyverno log alert filter to capture control plane-to-node connectivity failures via Konnectivity (upstream Kubernetes); commonly seen on GKE (especially private nodes), but not GKE-specific.
- Add
notification_promptsparam for LiteLLM and Typesense
- Modify the default values of the pod restart alerts
durationandalignment_period
- refs platform/board#4051: add LiteLLM monitoring
- refs platform/board#4071: add SSL certificate expiration alert configuration
- refs platform/board#4052: add Typesense monitoring alerts and configuration for uptime checks and container checks
- refs platform/board#3935: Kyverno log alert filter updated with explicit error patterns.
- The previous
severity>=ERRORfilter for Kyverno log alerts has been removed and replaced with explicit text pattern matching. This significantly alters alert behavior, as alerts are now triggered based on specific error patterns rather than severity level. Please review and update your alert expectations accordingly.
- Rename tf file from
cloud-sql.tftocloud_sql.tf. - Rename tf file from
kyverno_log_alert.tftokyverno.tf. - Add cert-manager missing issuer alert log.
- Add kyverno alert log.
- Update module documentation.
- Increase default alert thresholds for Cloud SQL CPU, memory and disk utilization.
- Fixed Google provider minimum required version.
- Add support for Cloud SQL monitoring:
- CPU utilization
- Memory utilization
- Disk utilization