Skip to content

Dead cluster after updating xpack.notification.slack.default_account to an account that does not exist #115298

Open
@romain-chanu

Description

@romain-chanu

Elasticsearch Version

8.15.3

Installed Plugins

No response

Java Version

bundled

OS Version

N/A

Problem Description

This has been observed in the field and the problem is reproducible.

Updating xpack.notification.slack.default_account (c.f Slack Notification Settings) to an account that does not exist via the cluster update settings API leads to a dead cluster (c.f steps to reproduce below)

It is questionable whether this should be a dynamic setting as this documentation states that:

You can no longer configure Slack accounts using elasticsearch.yml settings. Please use Elasticsearch’s secure [keystore](https://www.elastic.co/guide/en/elasticsearch/reference/current/secure-settings.html) method instead.

Notice as well that in this below example, the default account value contains spaces. We could not find any workaround to recover from this situation (AFAIK it is impossible to configure the account in the elasticsearch.yml file or to define the secure URL in the keystore while the account name has a space in it)

Steps to Reproduce

  1. Create a deployment in ESS with 2AZ for the hot data and content tier

  2. Run the below API and notice that the API is successfully acknowledged:

PUT _cluster/settings
{
    "persistent": {
        "xpack.notification.slack.default_account": "Slack Alerts"
    }
}
  1. Run GET _cluster/settings and observe that xpack.notification.slack.default_account is not in the result

  2. Run the below API to reset the setting:

PUT _cluster/settings
{
    "persistent": {
        "xpack.notification.slack.default_account": ""
    }
}

and observe the below error:

{
  "error": {
    "root_cause": [
      {
        "type": "not_master_exception",
        "reason": "no longer master"
      }
    ],
    "type": "master_not_discovered_exception",
    "reason": "org.elasticsearch.cluster.NotMasterException: no longer master",
    "caused_by": {
      "type": "not_master_exception",
      "reason": "no longer master"
    }
  },
  "status": 503
}
  1. Check the logs and observe that:

a) Master node keeps changing (c.f master node changed event logs)

b) All nodes are reporting similar log message

[tiebreaker-0000000002] failed to apply settings org.elasticsearch.common.settings.SettingsException: could not find default account [Slack Alerts] at org.elasticsearch.xpack.watcher.notification.NotificationService.findDefaultAccountOrNull(NotificationService.java:178) ~[?:?] at org.elasticsearch.xpack.watcher.notification.NotificationService.buildAccounts(NotificationService.java:107) ~[?:?] at org.elasticsearch.xpack.watcher.notification.NotificationService.clusterSettingsConsumer(NotificationService.java:77) ~[?:?] at org.elasticsearch.common.settings.Setting$2.apply(Setting.java:850) ~[elasticsearch-8.15.3.jar:?] at org.elasticsearch.common.settings.Setting$2.apply(Setting.java:822) ~[elasticsearch-8.15.3.jar:?] at org.elasticsearch.common.settings.AbstractScopedSettings$SettingUpdater.lambda$updater$0(AbstractScopedSettings.java:654) ~[elasticsearch-8.15.3.jar:?] at org.elasticsearch.common.settings.AbstractScopedSettings.applySettings(AbstractScopedSettings.java:174) ~[elasticsearch-8.15.3.jar:?] at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:498) ~[elasticsearch-8.15.3.jar:?] at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432) ~[elasticsearch-8.15.3.jar:?] at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:156) ~[elasticsearch-8.15.3.jar:?] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917) ~[elasticsearch-8.15.3.jar:?] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:217) ~[elasticsearch-8.15.3.jar:?] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:183) ~[elasticsearch-8.15.3.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?] at java.lang.Thread.run(Thread.java:1570) ~[?:?]

Logs (if relevant)

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions