You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Terraform module that configures important elasticsearch alerts using CloudWatch and sends them to an SNS topic.
7
-
8
-
Create a set of sane Elasticsearch CloudWatch alerts for monitoring the health of an elasticsearch cluster.
6
+
Terraform module that configures the [recommended Amazon ElasticSearch Alarms](https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/cloudwatch-alarms.html) using CloudWatch and sends alerts to an SNS topic. By default, this module creates an SNS topic, but it can be configured to point to an existing SNS topic (see [example](./examples/use-existing-sns/main.tf))
9
7
10
8
`v1.x` supports terraform `v0.12` syntax!
11
9
12
10
This project is inspired by [CloudPosse](https://github.com/cloudposse)
13
11
14
12
It's 100% Open Source and licensed under the [APACHE2](LICENSE).
| Sharding | ClusterStatus.red |`>=`| 1 | At least one primary shard and its replicas are not allocated to a node for 1 minute 1 consecutive time. Threshold should always be 1. |
21
-
| Sharding | ClusterStatus.yellow |`>=`| 1 | At least one replica shard is not allocated to a node for 1 minute 1 consecutive time. Threshold should always be 1. |
22
-
| Storage | FreeStorageSpace |`<=`| 20480 MB | A node in your cluster is down to 20 GiB of free storage space for 1 minute 1 consecutive time. This value is in MiB, so rather than 20480, we recommend setting it to 25% of the storage space for each node. |
23
-
| Storage | ClusterIndexWritesBlocked |`>=`| 1 | The cluster is blocking write requests for 5 minutes 1 consecutive time. Threshold should always be 1. |
24
-
| Node Count | Nodes |`<`|`x`|`x` is the number of nodes in your cluster. This alarm indicates that at least one node in your cluster has been unreachable for one day. |
25
-
| Snapshot | AutomatedSnapshotFailure |`>=`| 1 | An automated snapshot failed for 1 minute 1 consecutive time. This failure is often the result of a red cluster health status. |
26
-
| CPU | CPUUtilization |`>=`| 80 % | CPU utilization average is >= 80% for 15 minutes, 3 consecutive times for the node cluster. |
27
-
| Memory | JVMMemoryPressure |`>=`| 80 % | JVMMemoryPressure maximum is >= 80% for 15 minutes, 1 consecutive time. |
28
-
| CPU | MasterCPUUtilization |`>=`| 80 % | Dedicated master nodes' CPU utilization is >= 80% for 15 minutes, 3 consecutive times. |
29
-
| Memory | MasterJVMMemoryPressure |`>=`| 80 % | Dedicated master nodes' maximum JVM memory usage is >= 80% for 15 minutes, 1 consecutive time. |
| Sharding | ClusterStatus.red |`>=`| 1 | At least one primary shard and its replicas are not allocated to a node |
19
+
| Sharding | ClusterStatus.yellow |`>=`| 1 | At least one replica shard is not allocated to a node |
20
+
| Storage | FreeStorageSpace |`<=`| 20480 MB | A node in your cluster is down to low storage space. |
21
+
| Storage | ClusterIndexWritesBlocked |`>=`| 1 | Your cluster is blocking write requests. |
22
+
| Node Count | Nodes |`<`|`x`| This alarm indicates that at least one node in your cluster has been unreachable for one day |
23
+
| Snapshot | AutomatedSnapshotFailure |`>=`| 1 | An automated snapshot failed. This failure is often the result of a red cluster health status. |
24
+
| CPU | CPUUtilization |`>=`| 80 % | 100% CPU utilization isn't uncommon, but sustained high usage is problematic. Consider using larger instance types or more instances. |
25
+
| Memory | JVMMemoryPressure |`>=`| 80 % | The cluster could encounter out of memory errors if usage increases. Consider scaling vertically. |
26
+
| CPU | MasterCPUUtilization |`>=`| 80 % | Consider using larger instance types for your dedicated master nodes. |
27
+
| Memory | MasterJVMMemoryPressure |`>=`| 80 % | Consider using larger instance types for your dedicated master nodes. |
28
+
| KMS | KMSKeyError |`>=`| 1 | The KMS encryption key that is used to encrypt data at rest in your domain is disabled. Re-enable it to restore normal operations |
29
+
| Memory | KMSKeyInaccessible |`>=`| 80 % | The KMS encryption key that is used to encrypt data at rest in your domain has been deleted or has revoked its grants to Amazon ES |
30
+
31
+
For more information please see: [recommended Amazon ElasticSearch Alarms](https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/cloudwatch-alarms.html).
sns_topic = "arn:aws:sns:us-east-1:123456123456:sns-to-slack" # < Put your full SNS ARN here, if necessary read from var or a resource
66
71
create_sns_topic = false
72
+
tags = {
73
+
Domain = "TestDomain"
74
+
}
67
75
}
68
76
```
69
77
70
78
71
79
## Inputs
72
80
73
-
| Name | Description | Type | Default | Required |
74
-
|------|-------------|:----:|:-----:|:-----:|
75
-
|`domain_name`| The Elasticserach domain name you want to monitor. | string | - | yes |
76
-
|`alarm_name_postfix`| Alarm name postfix | string |`""`| no |
77
-
|`alarm_name_prefix`| Alarm name prefix | string |`""`| no |
78
-
|`cpu_utilization_threshold`| The maximum percentage of CPU utilization | string |`80`| no |
79
-
|`free_storage_space_threshold`| The minimum amount of available storage space in MiB. | string |`20480`| no |
80
-
|`jvm_memory_pressure_threshold`| The maximum percentage of the Java heap used for all data nodes in the cluster | string |`80`| no |
81
-
|`master_cpu_utilization_threshold`| The maximum percentage of CPU utilization of master nodes | string |`""`| no |
82
-
|`master_jvm_memory_pressure_threshold`| The maximum percentage of the Java heap used for master nodes in the cluster | string |`""`| no |
83
-
|`min_available_nodes`| The minimum available (reachable) nodes to have | string |`1`| no |
84
-
|`monitor_automated_snapshot_failure`| Enable monitoring of automated snapshot failure | string |`true`| no |
85
-
|`monitor_cluster_index_writes_blocked`| Enable monitoring of cluster index writes being blocked | string |`true`| no |
86
-
|`monitor_cluster_status_is_red`| Enable monitoring of cluster status is in red | string |`true`| no |
87
-
|`monitor_cluster_status_is_yellow`| Enable monitoring of cluster status is in yellow | string |`true`| no |
88
-
|`monitor_cpu_utilization_too_high`| Enable monitoring of CPU utilization is too high | string |`true`| no |
89
-
|`monitor_free_storage_space_too_low`| Enable monitoring of cluster average free storage is to low | string |`true`| no |
90
-
|`monitor_insufficient_available_nodes`| Enable monitoring insufficient available nodes | string |`false`| no |
91
-
|`monitor_jvm_memory_pressure_too_high`| Enable monitoring of JVM memory pressure is too high | string |`true`| no |
92
-
|`monitor_master_cpu_utilization_too_high`| Enable monitoring of CPU utilization of master nodes are too high. Only enable this when dedicated master is enabled | string |`false`| no |
93
-
|`monitor_master_jvm_memory_pressure_too_high`| Enable monitoring of JVM memory pressure of master nodes are too high. Only enable this wwhen dedicated master is enabled | string |`false`| no |
94
-
|`create_sns_topic`| Will create an SNS topic, if you set this to false you MUST set `sns_topic` to a FULL ARN | string |`true`| no |
95
-
|`sns_topic`| SNS topic you want to specify. If leave empty, it will use a prefix and a timestamp appended. If `create_sns_topic` is set to false, this MUST be a FULL ARN | string |`""`| no |
96
-
|`sns_topic_postfix`| SNS topic postfix | string |`""`| no |
97
-
|`sns_topic_prefix`| SNS topic prefix | string |`""`| no |
81
+
| Name | Description | Type | Default | Required |
|`domain_name`| The Elasticserach domain name you want to monitor. | string | - | yes |
84
+
|`alarm_cluster_status_is_yellow_periods`| The number of periods before triggering the cluster status is yellow, raise this if desired to make less noisy | number |`1`| no |
85
+
|`alarm_free_storage_space_too_low_periods`| The number of periods before triggering the disk space is low, raise this if desired to make less noisy | number |`1`| no |
86
+
|`alarm_name_postfix`| Alarm name postfix | string |`""`| no |
87
+
|`alarm_name_prefix`| Alarm name prefix | string |`""`| no |
88
+
|`cpu_utilization_threshold`| The maximum percentage of CPU utilization | string |`80`| no |
89
+
|`free_storage_space_threshold`| The minimum amount of available storage space in MiB. | string |`20480`| no |
90
+
|`jvm_memory_pressure_threshold`| The maximum percentage of the Java heap used for all data nodes in the cluster | string |`80`| no |
91
+
|`master_cpu_utilization_threshold`| The maximum percentage of CPU utilization of master nodes | string |`""`| no |
92
+
|`master_jvm_memory_pressure_threshold`| The maximum percentage of the Java heap used for master nodes in the cluster | string |`""`| no |
93
+
|`min_available_nodes`| The minimum available (reachable) nodes to have, set to non-zero to enable alarm | string |`0`| no |
94
+
|`monitor_automated_snapshot_failure`| Enable monitoring of automated snapshot failure | bool |`true`| no |
95
+
|`monitor_cluster_index_writes_blocked`| Enable monitoring of cluster index writes being blocked | bool |`true`| no |
96
+
|`monitor_cluster_status_is_red`| Enable monitoring of cluster status is in red | bool |`true`| no |
97
+
|`monitor_cluster_status_is_yellow`| Enable monitoring of cluster status is in yellow | bool |`true`| no |
98
+
|`monitor_cpu_utilization_too_high`| Enable monitoring of CPU utilization is too high | bool |`true`| no |
99
+
|`monitor_free_storage_space_too_low`| Enable monitoring of cluster average free storage is to low | bool |`true`| no |
100
+
|`monitor_jvm_memory_pressure_too_high`| Enable monitoring of JVM memory pressure is too high | bool |`true`| no |
101
+
|`monitor_kms`| Enable monitoring of KMS-related metrics, enable if using KMS | bool |`false`| no |
102
+
|`monitor_master_cpu_utilization_too_high`| Enable monitoring of CPU utilization of master nodes are too high. Only enable this when dedicated master is enabled | bool |`false`| no |
103
+
|`monitor_master_jvm_memory_pressure_too_high`| Enable monitoring of JVM memory pressure of master nodes are too high. Only enable this wwhen dedicated master is enabled | bool |`false`| no |
104
+
|`create_sns_topic`| Will create an SNS topic, if you set this to false you MUST set `sns_topic` to a FULL ARN | bool |`true`| no |
105
+
|`sns_topic`| SNS topic you want to specify. If leave empty, it will use a prefix and a timestamp appended. If `create_sns_topic` is set to false, this MUST be a FULL ARN | string |`""`| no |
106
+
|`sns_topic_postfix`| SNS topic postfix | string |`""`| no |
107
+
|`sns_topic_prefix`| SNS topic prefix | string |`""`| no |
108
+
|`tags`| Tags to associate with all created resources | map |`{}`| no |
0 commit comments