You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Sharding | ClusterStatus.red |`>=`| 1 | At least one primary shard and its replicas are not allocated to a node |
20
20
| Sharding | ClusterStatus.yellow |`>=`| 1 | At least one replica shard is not allocated to a node |
21
-
| Storage | FreeStorageSpace |`<=`| 20480 MB | A node in your cluster is down to low storage space. |
21
+
| Storage | FreeStorageSpace |`<=`| 20480 MB | A node in your cluster is down to low storage space. Note, this alarm uses the aggregate `Minimum` which means this alarm triggers per-node in your cluster. This logic is based-on the [AWS Recommended Alarms](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/cloudwatch-alarms.html). It does not however alarm based on an aggregate of free space remaining. |
22
+
| Storage | FreeStorageSpaceTotal |`<=`| 20480 MB | The overall disk space free is low. This alarm uses `Sum` across all your nodes, this can be useful on multi-node clusters. Disabled by default, to enable this you must set `monitor_free_storage_space_total_too_low` to true, and `free_storage_space_total_threshold`. Recommended to set the threshold to the number of nodes in your cluster multiplied by the free_storage_space_threshold |
22
23
| Storage | ClusterIndexWritesBlocked |`>=`| 1 | Your cluster is blocking write requests. |
23
24
| Node Count | Nodes |`<`|`x`| This alarm indicates that at least one node in your cluster has been unreachable for one day |
24
25
| Snapshot | AutomatedSnapshotFailure |`>=`| 1 | An automated snapshot failed. This failure is often the result of a red cluster health status. |
@@ -79,55 +80,62 @@ module "es_alarms" {
79
80
80
81
## Inputs
81
82
82
-
| Name | Description | Type | Default | Required |
|`domain_name`| The Elasticserach domain name you want to monitor. | string | - | yes |
85
-
|`cluster_type`| The type of cluster, single or multi-node | string |`"single"`| no |
86
-
|`monitor_cluster_status_is_red_periods`| The number of periods to alert that cluster status is red, raise this to be less noisy | number |`1`| no |
87
-
|`alarm_cluster_status_is_yellow_periods`| The number of periods before triggering the cluster status is yellow, raise this to be less noisy | number |`1`| no |
88
-
|`alarm_free_storage_space_too_low_periods`| The number of periods before triggering the disk space is low, raise this to be less noisy | number |`1`| no |
89
-
|`monitor_cluster_index_writes_blocked_periods`| The number of periods to alert that cluster index writes are blocked, raise this if desired to make less noisy | number |`1`| no |
90
-
|`monitor_min_available_nodes_periods`| The number of periods to alert that minimum number of available nodes dropped below a threshold, raise this if desired to make less noisy | number |`1`| no |
91
-
|`monitor_automated_snapshot_failure_periods`| The number of periods to alert that automatic snapshots failed, raise this if desired to make less noisy | number |`1`| no |
92
-
|`monitor_cpu_utilization_too_high_periods`| The number of periods to alert that CPU usage is too high, raise this if desired to make less noisy | number |`3`| no |
93
-
|`monitor_jvm_memory_pressure_too_high_periods`| The number of periods which it must be in the alarmed state to alert, raise this if desired to make less noisy | number |`1`| no |
94
-
|`monitor_master_cpu_utilization_too_high_periods`| The number of periods to alert that masters CPU usage is too high, raise this if desired to make less noisy | number |`3`| no |
95
-
|`monitor_master_jvm_memory_pressure_too_high_periods`| The number of periods which it must be in the alarmed state to alert, raise this if desired to make less noisy | number |`1`| no |
96
-
|`monitor_kms_periods`| The number of periods to alert that kms has failed, raise this if desired to make less noisy | number |`1`| no |
97
-
|`alarm_name_postfix`| Alarm name postfix | string |`""`| no |
98
-
|`alarm_name_prefix`| Alarm name prefix | string |`""`| no |
99
-
|`cpu_utilization_threshold`| The maximum percentage of CPU utilization | string |`80`| no |
100
-
|`free_storage_space_threshold`| The minimum amount of available storage space in MiB. | string |`20480`| no |
101
-
|`jvm_memory_pressure_threshold`| The maximum percentage of the Java heap used for all data nodes in the cluster | string |`80`| no |
102
-
|`master_cpu_utilization_threshold`| The maximum percentage of CPU utilization of master nodes | string |`""`| no |
103
-
|`master_jvm_memory_pressure_threshold`| The maximum percentage of the Java heap used for master nodes in the cluster | string |`""`| no |
104
-
|`min_available_nodes`| The minimum available (reachable) nodes to have, set to non-zero to enable alarm | string |`0`| no |
105
-
|`monitor_automated_snapshot_failure`| Enable monitoring of automated snapshot failure | bool |`true`| no |
106
-
|`monitor_cluster_index_writes_blocked`| Enable monitoring of cluster index writes being blocked | bool |`true`| no |
107
-
|`monitor_cluster_status_is_red`| Enable monitoring of cluster status is in red | bool |`true`| no |
108
-
|`monitor_cluster_status_is_yellow`| Enable monitoring of cluster status is in yellow | bool |`true`| no |
109
-
|`monitor_cpu_utilization_too_high`| Enable monitoring of CPU utilization is too high | bool |`true`| no |
110
-
|`monitor_free_storage_space_too_low`| Enable monitoring of cluster average free storage is to low | bool |`true`| no |
111
-
|`monitor_jvm_memory_pressure_too_high`| Enable monitoring of JVM memory pressure is too high | bool |`true`| no |
112
-
|`monitor_kms`| Enable monitoring of KMS-related metrics, enable if using KMS | bool |`false`| no |
113
-
|`monitor_master_cpu_utilization_too_high`| Enable monitoring of CPU utilization of master nodes are too high. Only enable this when dedicated master is enabled | bool |`false`| no |
114
-
|`monitor_master_jvm_memory_pressure_too_high`| Enable monitoring of JVM memory pressure of master nodes are too high. Only enable this wwhen dedicated master is enabled | bool |`false`| no |
115
-
|`monitor_min_available_nodes_period`| The period of the minimum available nodes should the statistics be applied in seconds | string |`86400`| no |
116
-
|`monitor_automated_snapshot_failure_period`| The period of the automated snapshot failure should the statistics be applied in seconds | string |`60`| no |
117
-
|`monitor_cluster_index_writes_blocked_period`| The period of the cluster index writes being blocked should the statistics be applied in seconds | string |`300`| no |
118
-
|`monitor_cluster_status_is_red_period`| The period of the cluster status is in red should the statistics be applied in seconds | string |`60`| no |
119
-
|`monitor_cluster_status_is_yellow_period`| The period of the cluster status is in yellow should the statistics be applied in seconds | string |`60`| no |
120
-
|`monitor_cpu_utilization_too_high_period`| The period of the CPU utilization is too high should the statistics be applied in seconds | string |`900`| no |
121
-
|`monitor_free_storage_space_too_low_period`| The period of the cluster average free storage is too low should the statistics be applied in seconds | string |`60`| no |
122
-
|`monitor_jvm_memory_pressure_too_high_period`| The period of the JVM memory pressure is too high should the statistics be applied in seconds | string |`900`| no |
123
-
|`monitor_kms_period`| The period of the KMS-related metrics should the statistics be applied in seconds | string |`60`| no |
124
-
|`monitor_master_cpu_utilization_too_high_period`| The period of the CPU utilization of master nodes are too high should the statistics be applied in seconds | string |`900`| no |
125
-
|`monitor_master_jvm_memory_pressure_too_high_period`| The period of the JVM memory pressure of master nodes are too high should the statistics be applied in seconds | string |`900`| no |
126
-
|`create_sns_topic`| Will create an SNS topic, if you set this to false you MUST set `sns_topic` to a FULL ARN | bool |`true`| no |
127
-
|`sns_topic`| SNS topic you want to specify. If leave empty, it will use a prefix and a timestamp appended. If `create_sns_topic` is set to false, this MUST be a FULL ARN | string |`""`| no |
128
-
|`sns_topic_postfix`| SNS topic postfix | string |`""`| no |
129
-
|`sns_topic_prefix`| SNS topic prefix | string |`""`| no |
130
-
|`tags`| Tags to associate with all created resources | map |`{}`| no |
83
+
| Name | Description | Type | Default | Required |
|`domain_name`| The Elasticserach domain name you want to monitor. | string | - | yes |
86
+
|`cluster_type`| The type of cluster, single or multi-node | string |`"single"`| no |
87
+
|`alarm_name_postfix`| Alarm name postfix | string |`""`| no |
88
+
|`alarm_name_prefix`| Alarm name prefix | string |`""`| no |
89
+
|`create_sns_topic`| Will create an SNS topic, if you set this to false you MUST set `sns_topic` to a FULL ARN | bool |`true`| no |
90
+
|`sns_topic`| SNS topic you want to specify. If leave empty, it will use a prefix and a timestamp appended. If `create_sns_topic` is set to false, this MUST be a FULL ARN | string |`""`| no |
91
+
|`sns_topic_postfix`| SNS topic postfix | string |`""`| no |
92
+
|`sns_topic_prefix`| SNS topic prefix | string |`""`| no |
93
+
|`tags`| Tags to associate with all created resources | map |`{}`| no |
94
+
|`cpu_utilization_threshold`| The maximum percentage of CPU utilization | string |`80`| no |
95
+
|`free_storage_space_threshold`| The minimum amount of available storage space in MiB. | string |`20480`| no |
96
+
|`jvm_memory_pressure_threshold`| The maximum percentage of the Java heap used for all data nodes in the cluster | string |`80`| no |
97
+
|`master_cpu_utilization_threshold`| The maximum percentage of CPU utilization of master nodes | string |`""`| no |
98
+
|`master_jvm_memory_pressure_threshold`| The maximum percentage of the Java heap used for master nodes in the cluster | string |`""`| no |
99
+
|`min_available_nodes`| The minimum available (reachable) nodes to have, set to non-zero to enable alarm | string |`0`| no |
100
+
101
+
|`monitor_automated_snapshot_failure`| Enable monitoring of automated snapshot failure | bool |`true`| no |
102
+
|`monitor_cluster_status_is_red`| Enable monitoring of cluster status is in red | bool |`true`| no |
103
+
|`monitor_cluster_status_is_yellow`| Enable monitoring of cluster status is in yellow | bool |`true`| no |
104
+
|`monitor_cluster_index_writes_blocked`| Enable monitoring of cluster index writes being blocked | bool |`true`| no |
105
+
|`monitor_cpu_utilization_too_high`| Enable monitoring of CPU utilization is too high | bool |`true`| no |
106
+
|`monitor_free_storage_space_too_low`| Enable monitoring of minimum per-node free storage is too low | bool |`true`| no |
107
+
|`monitor_free_storage_space_total_too_low`| Enable monitoring of cluster total free storage is too low | bool |`false`| no |
108
+
|`monitor_jvm_memory_pressure_too_high`| Enable monitoring of JVM memory pressure is too high | bool |`true`| no |
109
+
|`monitor_kms`| Enable monitoring of KMS-related metrics, enable if using KMS | bool |`false`| no |
110
+
|`monitor_master_cpu_utilization_too_high`| Enable monitoring of CPU utilization of master nodes are too high. Only enable this when dedicated master is enabled | bool |`false`| no |
111
+
|`monitor_master_jvm_memory_pressure_too_high`| Enable monitoring of JVM memory pressure of master nodes are too high. Only enable this wwhen dedicated master is enabled | bool |`false`| no |
112
+
|`monitor_min_available_nodes`| Enable monitoring of minimum available nodes | bool |`true`| no |
113
+
114
+
|`alarm_automated_snapshot_failure_periods`| The number of periods to alert that automatic snapshots failed, raise this if desired to make less noisy | number |`1`| no |
115
+
|`alarm_cluster_status_is_red_periods`| The number of periods to alert that cluster status is red, raise this to be less noisy | number |`1`| no |
116
+
|`alarm_cluster_status_is_yellow_periods`| The number of periods before triggering the cluster status is yellow, raise this to be less noisy | number |`1`| no |
117
+
|`alarm_cluster_index_writes_blocked_periods`| The number of periods to alert that cluster index writes are blocked, raise this if desired to make less noisy | number |`1`| no |
118
+
|`alarm_cpu_utilization_too_high_periods`| The number of periods to alert that CPU usage is too high, raise this if desired to make less noisy | number |`3`| no |
119
+
|`alarm_free_storage_space_too_low_periods`| The number of periods before triggering the disk space is low, raise this to be less noisy | number |`1`| no |
120
+
|`alarm_free_storage_space_total_too_low_periods`| The number of periods before triggering the total disk space is low, raise this to be less noisy | number |`1`| no |
121
+
|`alarm_jvm_memory_pressure_too_high_periods`| The number of periods which it must be in the alarmed state to alert, raise this if desired to make less noisy | number |`1`| no |
122
+
|`alarm_kms_periods`| The number of periods to alert that kms has failed, raise this if desired to make less noisy | number |`1`| no |
123
+
|`alarm_master_cpu_utilization_too_high_periods`| The number of periods to alert that masters CPU usage is too high, raise this if desired to make less noisy | number |`3`| no |
124
+
|`alarm_master_jvm_memory_pressure_too_high_periods`| The number of periods which it must be in the alarmed state to alert, raise this if desired to make less noisy | number |`1`| no |
125
+
|`alarm_min_available_nodes_periods`| The number of periods to alert that minimum number of available nodes dropped below a threshold, raise this if desired to make less noisy | number |`1`| no |
126
+
127
+
|`alarm_min_available_nodes_period`| The period of the minimum available nodes should the statistics be applied in seconds | string |`86400`| no |
128
+
|`alarm_automated_snapshot_failure_period`| The period of the automated snapshot failure should the statistics be applied in seconds | string |`60`| no |
129
+
|`alarm_cluster_index_writes_blocked_period`| The period of the cluster index writes being blocked should the statistics be applied in seconds | string |`300`| no |
130
+
|`alarm_cluster_status_is_red_period`| The period of the cluster status is in red should the statistics be applied in seconds | string |`60`| no |
131
+
|`alarm_cluster_status_is_yellow_period`| The period of the cluster status is in yellow should the statistics be applied in seconds | string |`60`| no |
132
+
|`alarm_cpu_utilization_too_high_period`| The period of the CPU utilization is too high should the statistics be applied in seconds | string |`900`| no |
133
+
|`alarm_free_storage_space_too_low_period`| The period of the per-node minimum free storage is too low should the statistics be applied in seconds | string |`60`| no |
134
+
|`alarm_free_storage_space_total_too_low_period`| The period of the cluster total free storage is too low should the statistics be applied in seconds | string |`60`| no |
135
+
|`alarm_jvm_memory_pressure_too_high_period`| The period of the JVM memory pressure is too high should the statistics be applied in seconds | string |`900`| no |
136
+
|`alarm_kms_period`| The period of the KMS-related metrics should the statistics be applied in seconds | string |`60`| no |
137
+
|`alarm_master_cpu_utilization_too_high_period`| The period of the CPU utilization of master nodes are too high should the statistics be applied in seconds | string |`900`| no |
138
+
|`alarm_master_jvm_memory_pressure_too_high_period`| The period of the JVM memory pressure of master nodes are too high should the statistics be applied in seconds | string |`900`| no |
0 commit comments