Skip to content

Commit 2451df9

Browse files
authored
Merge pull request #10 from AndrewFarley/add-more-alarms-allow-customizing-some-alarms
Adding tags, KMS Support, Customizing some periods
2 parents 87d718e + d77cf02 commit 2451df9

5 files changed

Lines changed: 188 additions & 58 deletions

File tree

README.md

Lines changed: 53 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -3,30 +3,32 @@
33
[![Build Status](https://travis-ci.com/dubiety/terraform-aws-elasticsearch-cloudwatch-sns-alarms.svg?branch=master)](https://travis-ci.org/dubiety/terraform-aws-elasticsearch-cloudwatch-sns-alarms)
44
[![Latest Release](https://img.shields.io/github/release/dubiety/terraform-aws-elasticsearch-cloudwatch-sns-alarms.svg)](https://github.com/dubiety/terraform-aws-elasticsearch-cloudwatch-sns-alarms/releases)
55

6-
Terraform module that configures important elasticsearch alerts using CloudWatch and sends them to an SNS topic.
7-
8-
Create a set of sane Elasticsearch CloudWatch alerts for monitoring the health of an elasticsearch cluster.
6+
Terraform module that configures the [recommended Amazon ElasticSearch Alarms](https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/cloudwatch-alarms.html) using CloudWatch and sends alerts to an SNS topic. By default, this module creates an SNS topic, but it can be configured to point to an existing SNS topic (see [example](./examples/use-existing-sns/main.tf))
97

108
`v1.x` supports terraform `v0.12` syntax!
119

1210
This project is inspired by [CloudPosse](https://github.com/cloudposse)
1311

1412
It's 100% Open Source and licensed under the [APACHE2](LICENSE).
1513

16-
## Usage
17-
18-
| area | metric | comparison operator | threshold | rationale |
19-
|------------|---------------------------|---------------------|-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
20-
| Sharding | ClusterStatus.red | `>=` | 1 | At least one primary shard and its replicas are not allocated to a node for 1 minute 1 consecutive time. Threshold should always be 1. |
21-
| Sharding | ClusterStatus.yellow | `>=` | 1 | At least one replica shard is not allocated to a node for 1 minute 1 consecutive time. Threshold should always be 1. |
22-
| Storage | FreeStorageSpace | `<=` | 20480 MB | A node in your cluster is down to 20 GiB of free storage space for 1 minute 1 consecutive time. This value is in MiB, so rather than 20480, we recommend setting it to 25% of the storage space for each node. |
23-
| Storage | ClusterIndexWritesBlocked | `>=` | 1 | The cluster is blocking write requests for 5 minutes 1 consecutive time. Threshold should always be 1. |
24-
| Node Count | Nodes | `<` | `x` | `x` is the number of nodes in your cluster. This alarm indicates that at least one node in your cluster has been unreachable for one day. |
25-
| Snapshot | AutomatedSnapshotFailure | `>=` | 1 | An automated snapshot failed for 1 minute 1 consecutive time. This failure is often the result of a red cluster health status. |
26-
| CPU | CPUUtilization | `>=` | 80 % | CPU utilization average is >= 80% for 15 minutes, 3 consecutive times for the node cluster. |
27-
| Memory | JVMMemoryPressure | `>=` | 80 % | JVMMemoryPressure maximum is >= 80% for 15 minutes, 1 consecutive time. |
28-
| CPU | MasterCPUUtilization | `>=` | 80 % | Dedicated master nodes' CPU utilization is >= 80% for 15 minutes, 3 consecutive times. |
29-
| Memory | MasterJVMMemoryPressure | `>=` | 80 % | Dedicated master nodes' maximum JVM memory usage is >= 80% for 15 minutes, 1 consecutive time. |
14+
## Metrics and Alarms
15+
16+
| area | metric | operator | threshold | rationale |
17+
|------------|---------------------------|----------|-----------|----------------------------------------------------------------------------------------------------------------------------------------|
18+
| Sharding | ClusterStatus.red | `>=` | 1 | At least one primary shard and its replicas are not allocated to a node |
19+
| Sharding | ClusterStatus.yellow | `>=` | 1 | At least one replica shard is not allocated to a node |
20+
| Storage | FreeStorageSpace | `<=` | 20480 MB | A node in your cluster is down to low storage space. |
21+
| Storage | ClusterIndexWritesBlocked | `>=` | 1 | Your cluster is blocking write requests. |
22+
| Node Count | Nodes | `<` | `x` | This alarm indicates that at least one node in your cluster has been unreachable for one day |
23+
| Snapshot | AutomatedSnapshotFailure | `>=` | 1 | An automated snapshot failed. This failure is often the result of a red cluster health status. |
24+
| CPU | CPUUtilization | `>=` | 80 % | 100% CPU utilization isn't uncommon, but sustained high usage is problematic. Consider using larger instance types or more instances. |
25+
| Memory | JVMMemoryPressure | `>=` | 80 % | The cluster could encounter out of memory errors if usage increases. Consider scaling vertically. |
26+
| CPU | MasterCPUUtilization | `>=` | 80 % | Consider using larger instance types for your dedicated master nodes. |
27+
| Memory | MasterJVMMemoryPressure | `>=` | 80 % | Consider using larger instance types for your dedicated master nodes. |
28+
| KMS | KMSKeyError | `>=` | 1 | The KMS encryption key that is used to encrypt data at rest in your domain is disabled. Re-enable it to restore normal operations |
29+
| Memory | KMSKeyInaccessible | `>=` | 80 % | The KMS encryption key that is used to encrypt data at rest in your domain has been deleted or has revoked its grants to Amazon ES |
30+
31+
For more information please see: [recommended Amazon ElasticSearch Alarms](https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/cloudwatch-alarms.html).
3032

3133
## Examples
3234

@@ -53,6 +55,9 @@ resource "aws_elasticsearch_domain" "es" {
5355
module "es_alarms" {
5456
source = "github::https://github.com/dubiety/terraform-aws-elasticsearch-cloudwatch-sns-alarms.git?ref=master"
5557
domain_name = "example"
58+
tags = {
59+
Domain = "TestDomain"
60+
}
5661
}
5762
```
5863

@@ -64,37 +69,43 @@ module "es_alarms" {
6469
domain_name = "example"
6570
sns_topic = "arn:aws:sns:us-east-1:123456123456:sns-to-slack" # < Put your full SNS ARN here, if necessary read from var or a resource
6671
create_sns_topic = false
72+
tags = {
73+
Domain = "TestDomain"
74+
}
6775
}
6876
```
6977

7078

7179
## Inputs
7280

73-
| Name | Description | Type | Default | Required |
74-
|------|-------------|:----:|:-----:|:-----:|
75-
| `domain_name` | The Elasticserach domain name you want to monitor. | string | - | yes |
76-
| `alarm_name_postfix` | Alarm name postfix | string | `""` | no |
77-
| `alarm_name_prefix` | Alarm name prefix | string | `""` | no |
78-
| `cpu_utilization_threshold` | The maximum percentage of CPU utilization | string | `80` | no |
79-
| `free_storage_space_threshold` | The minimum amount of available storage space in MiB. | string | `20480` | no |
80-
| `jvm_memory_pressure_threshold` | The maximum percentage of the Java heap used for all data nodes in the cluster | string | `80` | no |
81-
| `master_cpu_utilization_threshold` | The maximum percentage of CPU utilization of master nodes | string | `""` | no |
82-
| `master_jvm_memory_pressure_threshold` | The maximum percentage of the Java heap used for master nodes in the cluster | string | `""` | no |
83-
| `min_available_nodes` | The minimum available (reachable) nodes to have | string | `1` | no |
84-
| `monitor_automated_snapshot_failure` | Enable monitoring of automated snapshot failure | string | `true` | no |
85-
| `monitor_cluster_index_writes_blocked` | Enable monitoring of cluster index writes being blocked | string | `true` | no |
86-
| `monitor_cluster_status_is_red` | Enable monitoring of cluster status is in red | string | `true` | no |
87-
| `monitor_cluster_status_is_yellow` | Enable monitoring of cluster status is in yellow | string | `true` | no |
88-
| `monitor_cpu_utilization_too_high` | Enable monitoring of CPU utilization is too high | string | `true` | no |
89-
| `monitor_free_storage_space_too_low` | Enable monitoring of cluster average free storage is to low | string | `true` | no |
90-
| `monitor_insufficient_available_nodes` | Enable monitoring insufficient available nodes | string | `false` | no |
91-
| `monitor_jvm_memory_pressure_too_high` | Enable monitoring of JVM memory pressure is too high | string | `true` | no |
92-
| `monitor_master_cpu_utilization_too_high` | Enable monitoring of CPU utilization of master nodes are too high. Only enable this when dedicated master is enabled | string | `false` | no |
93-
| `monitor_master_jvm_memory_pressure_too_high` | Enable monitoring of JVM memory pressure of master nodes are too high. Only enable this wwhen dedicated master is enabled | string | `false` | no |
94-
| `create_sns_topic` | Will create an SNS topic, if you set this to false you MUST set `sns_topic` to a FULL ARN | string | `true` | no |
95-
| `sns_topic` | SNS topic you want to specify. If leave empty, it will use a prefix and a timestamp appended. If `create_sns_topic` is set to false, this MUST be a FULL ARN | string | `""` | no |
96-
| `sns_topic_postfix` | SNS topic postfix | string | `""` | no |
97-
| `sns_topic_prefix` | SNS topic prefix | string | `""` | no |
81+
| Name | Description | Type | Default | Required |
82+
|-----------------------------------------------|-------------|:----:|:-------:|:--------:|
83+
| `domain_name` | The Elasticserach domain name you want to monitor. | string | - | yes |
84+
| `alarm_cluster_status_is_yellow_periods` | The number of periods before triggering the cluster status is yellow, raise this if desired to make less noisy | number | `1` | no |
85+
| `alarm_free_storage_space_too_low_periods` | The number of periods before triggering the disk space is low, raise this if desired to make less noisy | number | `1` | no |
86+
| `alarm_name_postfix` | Alarm name postfix | string | `""` | no |
87+
| `alarm_name_prefix` | Alarm name prefix | string | `""` | no |
88+
| `cpu_utilization_threshold` | The maximum percentage of CPU utilization | string | `80` | no |
89+
| `free_storage_space_threshold` | The minimum amount of available storage space in MiB. | string | `20480` | no |
90+
| `jvm_memory_pressure_threshold` | The maximum percentage of the Java heap used for all data nodes in the cluster | string | `80` | no |
91+
| `master_cpu_utilization_threshold` | The maximum percentage of CPU utilization of master nodes | string | `""` | no |
92+
| `master_jvm_memory_pressure_threshold` | The maximum percentage of the Java heap used for master nodes in the cluster | string | `""` | no |
93+
| `min_available_nodes` | The minimum available (reachable) nodes to have, set to non-zero to enable alarm | string | `0` | no |
94+
| `monitor_automated_snapshot_failure` | Enable monitoring of automated snapshot failure | bool | `true` | no |
95+
| `monitor_cluster_index_writes_blocked` | Enable monitoring of cluster index writes being blocked | bool | `true` | no |
96+
| `monitor_cluster_status_is_red` | Enable monitoring of cluster status is in red | bool | `true` | no |
97+
| `monitor_cluster_status_is_yellow` | Enable monitoring of cluster status is in yellow | bool | `true` | no |
98+
| `monitor_cpu_utilization_too_high` | Enable monitoring of CPU utilization is too high | bool | `true` | no |
99+
| `monitor_free_storage_space_too_low` | Enable monitoring of cluster average free storage is to low | bool | `true` | no |
100+
| `monitor_jvm_memory_pressure_too_high` | Enable monitoring of JVM memory pressure is too high | bool | `true` | no |
101+
| `monitor_kms` | Enable monitoring of KMS-related metrics, enable if using KMS | bool | `false` | no |
102+
| `monitor_master_cpu_utilization_too_high` | Enable monitoring of CPU utilization of master nodes are too high. Only enable this when dedicated master is enabled | bool | `false` | no |
103+
| `monitor_master_jvm_memory_pressure_too_high` | Enable monitoring of JVM memory pressure of master nodes are too high. Only enable this wwhen dedicated master is enabled | bool | `false` | no |
104+
| `create_sns_topic` | Will create an SNS topic, if you set this to false you MUST set `sns_topic` to a FULL ARN | bool | `true` | no |
105+
| `sns_topic` | SNS topic you want to specify. If leave empty, it will use a prefix and a timestamp appended. If `create_sns_topic` is set to false, this MUST be a FULL ARN | string | `""` | no |
106+
| `sns_topic_postfix` | SNS topic postfix | string | `""` | no |
107+
| `sns_topic_prefix` | SNS topic prefix | string | `""` | no |
108+
| `tags` | Tags to associate with all created resources | map | `{}` | no |
98109

99110
## Outputs
100111

0 commit comments

Comments
 (0)