Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 12 additions & 17 deletions docs/infra/set-up-monitoring-alerts.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,21 @@

The monitoring module defines metric-based alerting policies that provide awareness into issues with the cloud application. The module supports integration with external incident management tools like Splunk-On-Call or Pagerduty. It also supports email alerts.

### Set up email alerts.

1. Add the `email_alerts_subscription_list` variable to the monitoring module call in the service layer

For example:
```
module "monitoring" {
source = "../../modules/monitoring"
email_alerts_subscription_list = ["email1@email.com", "email2@email.com"]
...
}
```
### Set up email alerts

The monitoring module supports a simple email-based alerting system that does not rely on an external incident management service.

1. Update the `email_alert_recipients` variable in `app-config/env-config/monitoring.tf`

2. Run `make infra-update-app-service APP_NAME=<APP_NAME> ENVIRONMENT=<ENVIRONMENT>` to apply the changes to each environment.
When any of the alerts described by the module are triggered notification will be sent to all emails specified in the `email_alerts_subscription_list`

### Set up External incident management service integration.
### Integrate with an incident management service

1. Set setting `has_incident_management_service = true` in app-config/main.tf
2. Get the integration URL for the incident management service and store it in AWS SSM Parameter Store by running the following command for each environment:
```
make infra-configure-monitoring-secrets APP_NAME=<APP_NAME> ENVIRONMENT=<ENVIRONMENT> URL=<WEBHOOK_URL>
```

```bash
make infra-configure-monitoring-secrets APP_NAME=<APP_NAME> ENVIRONMENT=<ENVIRONMENT> URL=<WEBHOOK_URL>
```

3. Run `make infra-update-app-service APP_NAME=<APP_NAME> ENVIRONMENT=<ENVIRONMENT>` to apply the changes to each environment.
2 changes: 1 addition & 1 deletion infra/modules/monitoring/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ resource "aws_cloudwatch_metric_alarm" "high_app_response_time" {
#email integration

resource "aws_sns_topic_subscription" "email_integration" {
for_each = var.email_alerts_subscription_list
for_each = var.email_alert_recipients
topic_arn = aws_sns_topic.this.arn
protocol = "email"
endpoint = each.value
Expand Down
2 changes: 1 addition & 1 deletion infra/modules/monitoring/variables.tf
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
variable "email_alerts_subscription_list" {
variable "email_alert_recipients" {
type = set(string)
default = []
description = "List of emails to subscribe to alerts"
Expand Down
12 changes: 12 additions & 0 deletions infra/{{app_name}}/app-config/env-config/monitoring.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
locals {
monitoring_config = {
# Emails to notify for alerts.
# Use this as a simple notification mechanism if you don't have an incident management service.
email_alert_recipients = []

incident_management_service = var.has_incident_management_service ? {
integration_url_param_name = "/monitoring/${var.app_name}/${var.environment}/incident-management-integration-url"
} : null
}
}

6 changes: 2 additions & 4 deletions infra/{{app_name}}/app-config/env-config/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,8 @@ output "scheduled_jobs" {
value = local.scheduled_jobs
}

output "incident_management_service_integration" {
value = var.has_incident_management_service ? {
integration_url_param_name = "/monitoring/${var.app_name}/${var.environment}/incident-management-integration-url"
} : null
output "monitoring_config" {
value = local.monitoring_config
}

output "network_name" {
Expand Down
10 changes: 5 additions & 5 deletions infra/{{app_name}}/service/monitoring.tf
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
locals {
incident_management_service_integration_config = local.environment_config.incident_management_service_integration
monitoring_config = local.environment_config.monitoring_config
incident_management_service_integration_url = module.app_config.has_incident_management_service && !local.is_temporary ? data.aws_ssm_parameter.incident_management_service_integration_url[0].value : null
}

# Retrieve url for external incident management tool (e.g. Pagerduty, Splunk-On-Call)

data "aws_ssm_parameter" "incident_management_service_integration_url" {
count = module.app_config.has_incident_management_service ? 1 : 0
name = local.incident_management_service_integration_config.integration_url_param_name
name = local.monitoring_config.incident_management_service.integration_url_param_name
}

module "monitoring" {
source = "../../modules/monitoring"
#Email subscription list:
#email_alerts_subscription_list = ["email1@email.com", "email2@email.com"]

# Module takes service and ALB names to link all alerts with corresponding targets
service_name = local.service_name
load_balancer_arn_suffix = module.service.load_balancer_arn_suffix
incident_management_service_integration_url = module.app_config.has_incident_management_service && !local.is_temporary ? data.aws_ssm_parameter.incident_management_service_integration_url[0].value : null
email_alert_recipients = local.monitoring_config.email_alert_recipients
incident_management_service_integration_url = local.incident_management_service_integration_url
}
Loading