Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,17 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]


## [0.6.0] - 2025-12-10

[Compare with previous version](https://github.com/sparkfabrik/terraform-google-services-monitoring/compare/0.5.0...0.6.0)

### Changed

Comment thread
FabrizioCafolla marked this conversation as resolved.

### Added
- refs platform/board#4052: add Typesense monitoring alerts and configuration for uptime checks and container checks

## [0.5.0] - 2025-12-01

[Compare with previous version](https://github.com/sparkfabrik/terraform-google-services-monitoring/compare/0.4.0...0.5.0)
Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ Supported services:
| <a name="input_kyverno"></a> [kyverno](#input\_kyverno) | Configuration for Kyverno monitoring alerts. Allows customization of cluster name, project, notification channels, alert documentation, metric thresholds, auto-close timing, enablement, extra filters, and namespace. | <pre>object({<br/> enabled = optional(bool, true)<br/> cluster_name = string<br/> project_id = optional(string, null)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> # Rate limit for notifications, e.g. "300s" for 5 minutes, used only for log match alerts<br/> logmatch_notification_rate_limit = optional(string, "300s")<br/> alert_documentation = optional(string, null)<br/> auto_close_seconds = optional(number, 3600)<br/> filter_extra = optional(string, "")<br/> namespace = optional(string, "kyverno")<br/> })</pre> | n/a | yes |
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | List of notification channel IDs to notify when an alert is triggered | `list(string)` | `[]` | no |
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | The Google Cloud project ID where logging exclusions will be created | `string` | n/a | yes |
| <a name="input_typesense"></a> [typesense](#input\_typesense) | Configuration for Typesense monitoring alerts. Supports uptime checks for HTTP endpoints and container-level alerts (pod restarts) in GKE. Each app is identified by its name (map key). For container checks, the app name corresponds to the Kubernetes 'app' label; for apps with only uptime checks, this correspondence does not apply. | <pre>object({<br/> enabled = optional(bool, false)<br/> project_id = optional(string, null)<br/> notification_enabled = optional(bool, true)<br/> notification_channels = optional(list(string), [])<br/> cluster_name = optional(string, null) # GKE cluster name for container checks<br/><br/> # Apps configuration - map keyed by app_name<br/> apps = optional(map(object({<br/> # Uptime check configuration (optional)<br/> uptime_check = optional(object({<br/> enabled = optional(bool, true)<br/> host = string<br/> path = optional(string, "/readyz")<br/> }), null)<br/><br/> # Container check configuration for GKE (optional)<br/> container_check = optional(object({<br/> enabled = optional(bool, true)<br/> namespace = string<br/> pod_restart = optional(object({<br/> threshold = optional(number, 0)<br/> alignment_period = optional(number, 60)<br/> duration = optional(number, 0)<br/> auto_close_seconds = optional(number, 3600)<br/> }), {})<br/> }), null)<br/> })), {})<br/> })</pre> | `{}` | no |

## Outputs

Expand All @@ -58,9 +59,12 @@ Supported services:
| [google_monitoring_alert_policy.cloud_sql_disk_utilization](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
| [google_monitoring_alert_policy.cloud_sql_memory_utilization](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
| [google_monitoring_alert_policy.kyverno_logmatch_alert](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
| [google_monitoring_alert_policy.typesense_pod_restart](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |

## Modules

No modules.
| Name | Source | Version |
|------|--------|---------|
| <a name="module_typesense_uptime_checks"></a> [typesense\_uptime\_checks](#module\_typesense\_uptime\_checks) | github.com/sparkfabrik/terraform-sparkfabrik-gcp-http-monitoring | 1.0.0 |
Comment thread
FabrizioCafolla marked this conversation as resolved.

<!-- END_TF_DOCS -->
80 changes: 80 additions & 0 deletions typesense.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@

locals {
typesense_project = var.typesense.project_id != null ? var.typesense.project_id : var.project_id

typesense_notification_channels = var.typesense.notification_enabled ? (length(var.typesense.notification_channels) > 0 ? var.typesense.notification_channels : var.notification_channels) : []

typesense_uptime_checks = var.typesense.enabled ? {
for app_name, config in var.typesense.apps :
app_name => config.uptime_check
if config.uptime_check != null && try(config.uptime_check.enabled, false)
} : {}

typesense_container_checks = var.typesense.enabled ? {
for app_name, config in var.typesense.apps :
app_name => config.container_check
if config.container_check != null && try(config.container_check.enabled, false)
Comment thread
FabrizioCafolla marked this conversation as resolved.
Comment thread
FabrizioCafolla marked this conversation as resolved.
} : {}
}

module "typesense_uptime_checks" {
for_each = local.typesense_uptime_checks

source = "github.com/sparkfabrik/terraform-sparkfabrik-gcp-http-monitoring?ref=1.0.0"
gcp_project = local.typesense_project
uptime_monitoring_host = each.value.host
uptime_monitoring_path = each.value.path
alert_notification_channels = local.typesense_notification_channels
alert_threshold_value = 1
uptime_check_period = "900s"
}

# Alert: GKE Pod Restarts
# This alert monitors the restart count of Typesense containers in GKE.
# It triggers when the delta of restarts is greater than the threshold
# within the specified alignment period.
resource "google_monitoring_alert_policy" "typesense_pod_restart" {
for_each = local.typesense_container_checks

project = local.typesense_project
display_name = "Typesense Pod Restarts (cluster=${var.typesense.cluster_name}, namespace=${each.value.namespace}, app=${each.key})"
combiner = "OR"
enabled = true

conditions {
display_name = "Typesense container restart count > ${each.value.pod_restart.threshold}"

condition_threshold {
filter = <<-EOT
resource.type="k8s_container"
AND resource.labels.project_id="${local.typesense_project}"
AND resource.labels.cluster_name="${var.typesense.cluster_name}"
AND resource.labels.namespace_name="${each.value.namespace}"
AND metric.type="kubernetes.io/container/restart_count"
Comment thread
FabrizioCafolla marked this conversation as resolved.
EOT

comparison = "COMPARISON_GT"
threshold_value = each.value.pod_restart.threshold
duration = "${each.value.pod_restart.duration}s"

aggregations {
alignment_period = "${each.value.pod_restart.alignment_period}s"
per_series_aligner = "ALIGN_DELTA"
cross_series_reducer = "REDUCE_SUM"
group_by_fields = [
"metadata.user_labels.app",
]
}

trigger {
count = 1
}
}
}

notification_channels = local.typesense_notification_channels

alert_strategy {
auto_close = "${each.value.pod_restart.auto_close_seconds}s"
}
}
54 changes: 54 additions & 0 deletions variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -98,3 +98,57 @@ variable "cert_manager" {
filter_extra = optional(string, "")
})
}

variable "typesense" {
description = "Configuration for Typesense monitoring alerts. Supports uptime checks for HTTP endpoints and container-level alerts (pod restarts) in GKE. Each app is identified by its name (map key). For container checks, the app name corresponds to the Kubernetes 'app' label; for apps with only uptime checks, this correspondence does not apply."
default = {}

type = object({
enabled = optional(bool, false)
project_id = optional(string, null)
notification_enabled = optional(bool, true)
notification_channels = optional(list(string), [])
cluster_name = optional(string, null) # GKE cluster name for container checks

# Apps configuration - map keyed by app_name
apps = optional(map(object({
# Uptime check configuration (optional)
uptime_check = optional(object({
enabled = optional(bool, true)
host = string
path = optional(string, "/readyz")
}), null)

# Container check configuration for GKE (optional)
container_check = optional(object({
enabled = optional(bool, true)
namespace = string
pod_restart = optional(object({
threshold = optional(number, 0)
Comment thread
FabrizioCafolla marked this conversation as resolved.
alignment_period = optional(number, 60)
duration = optional(number, 0)
Comment thread
FabrizioCafolla marked this conversation as resolved.
auto_close_seconds = optional(number, 3600)
}), {})
}), null)
})), {})
})

validation {
condition = alltrue([
for app_name, config in var.typesense.apps : (
trimspace(app_name) != "" &&
(config.uptime_check != null ? try(trimspace(config.uptime_check.host), "") != "" : true) &&
(config.container_check != null ? try(trimspace(config.container_check.namespace), "") != "" : true)
)
])
error_message = "Each app must have a non-empty name (map key). If uptime_check is provided, 'host' must be non-empty. If container_check is provided, 'namespace' must be non-empty."
}

validation {
condition = (
length([for app_name, config in var.typesense.apps : app_name if config.container_check != null]) == 0 ||
try(trimspace(var.typesense.cluster_name), "") != ""
)
error_message = "When any app has container_check configured, 'cluster_name' must be provided at the typesense level."
}
}