-
Notifications
You must be signed in to change notification settings - Fork 291
Description
Describe the bug
The SmartScaler in the OpenSearch Kubernetes Operator is intermittently skipping the draining step when scaling down data nodes. Based on the logs, it correctly excludes a node, waits for it to drain, confirms the drain, and removes it. However, for some nodes, it skips the waiting step and removes them directly, potentially causing disruption.
To Reproduce
Steps to reproduce the behaviour:
1. Trigger a scale-down event for a data node group.
2. Monitor the operator logs for node exclusion, draining, and removal.
3. Observe that some nodes follow the expected exclusion → draining → removal sequence, while others are removed without waiting for a drain.
Expected behaviour
Every node undergoing scale-down should be properly drained before removal, ensuring cluster stability.
Operator Logs
{"level":"info","ts":"2025-02-20T12:38:29.164Z","msg":"Group: data, Excluded node: opensearch-data-14",...}
...
{"level":"info","ts":"2025-02-20T12:44:00.612Z","msg":"Group: data, Waiting for node opensearch-data-14 to drain",...}
...
{"level":"info","ts":"2025-02-20T12:49:28.491Z","msg":"Group: data, Node opensearch-data-14 is drained",...}
{"level":"info","ts":"2025-02-20T12:49:28.828Z","msg":"Group: data, Removed node opensearch-data-14",...}
{"level":"info","ts":"2025-02-20T12:49:29.120Z","msg":"Group: data, Removed node opensearch-data-13",...} <-- No drain step for data-13
{"level":"info","ts":"2025-02-20T12:49:44.805Z","msg":"Group: data, Excluded node: opensearch-data-12",...}
{"level":"info","ts":"2025-02-20T12:49:45.423Z","msg":"Group: data, Waiting for node opensearch-data-12 to drain",...}
Issue Breakdown
• opensearch-data-14 follows the correct process: Excluded → Drained → Removed
• opensearch-data-13 is removed without draining
• opensearch-data-12 resumes the correct behaviour
Impact
• Potential risk of data loss or increased cluster instability
• Unexpected scaling behaviour causing uneven shard distribution
Environment
• OpenSearch Operator version: 2.6.0
• OpenSearch version: 2.15.0
full log:
{"level":"info","ts":"2025-02-20T12:38:29.164Z","msg":"Group: data, Excluded node: opensearch-data-14","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch","namespace":"dedicated-resident-telem-logs-opensearch"},"namespace":"dedicated-resident-telem-logs-opensearch","name":"opensearch","reconcileID":"ba60913f-9fd0-4c0c-ad94-65775a1dde06"}
...
{"level":"info","ts":"2025-02-20T12:44:00.612Z","msg":"Group: data, Waiting for node opensearch-data-14 to drain","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch","namespace":"dedicated-resident-telem-logs-opensearch"},"namespace":"dedicated-resident-telem-logs-opensearch","name":"opensearch","reconcileID":"04eef43a-fcbe-4f7e-b6a5-0077910c29ce"}
...
{"level":"info","ts":"2025-02-20T12:49:28.491Z","msg":"Group: data, Node opensearch-data-14 is drained","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch","namespace":"dedicated-logs-opensearch"},"namespace":"dedicated-logs-opensearch","name":"opensearch","reconcileID":"c9d0a63c-addd-4fc6-ba1e-9e947147aee2"}
{"level":"info","ts":"2025-02-20T12:49:28.502Z","msg":"Reconciling OpenSearchCluster","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch","namespace":"dedicated-logs-opensearch"},"namespace":"dedicated-logs-opensearch","name":"opensearch","reconcileID":"0f523f48-591a-44dc-a616-2392fc3f57d7","cluster":{"name":"opensearch","namespace":"dedicated-logs-opensearch"}}
{"level":"info","ts":"2025-02-20T12:49:28.745Z","msg":"Group: data, Node opensearch-data-14 is drained","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch","namespace":"dedicated-logs-opensearch"},"namespace":"dedicated-logs-opensearch","name":"opensearch","reconcileID":"0f523f48-591a-44dc-a616-2392fc3f57d7"}
{"level":"info","ts":"2025-02-20T12:49:28.756Z","msg":"Reconciling OpenSearchCluster","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch","namespace":"dedicated-logs-opensearch"},"namespace":"dedicated-logs-opensearch","name":"opensearch","reconcileID":"7651335d-1f89-4cb2-b9e5-a353b3709545","cluster":{"name":"opensearch","namespace":"dedicated-logs-opensearch"}}
{"level":"info","ts":"2025-02-20T12:49:28.828Z","logger":"KubeAPIWarningLogger","msg":"would violate PodSecurity \"restricted:latest\": privileged (container \"init-sysctl\" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (containers \"init\", \"init-sysctl\", \"opensearch\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers \"init\", \"init-sysctl\" must set securityContext.capabilities.drop=[\"ALL\"]), runAsNonRoot != true (pod or containers \"init\", \"init-sysctl\" must set securityContext.runAsNonRoot=true), runAsUser=0 (container \"init\" must not set runAsUser=0), seccompProfile (pod or containers \"init\", \"init-sysctl\", \"opensearch\" must set securityContext.seccompProfile.type to \"RuntimeDefault\" or \"Localhost\")"}
{"level":"info","ts":"2025-02-20T12:49:28.828Z","msg":"Group: data, Removed node opensearch-data-14","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch","namespace":"dedicated-logs-opensearch"},"namespace":"dedicated-logs-opensearch","name":"opensearch","reconcileID":"7651335d-1f89-4cb2-b9e5-a353b3709545"}
{"level":"info","ts":"2025-02-20T12:49:28.966Z","msg":"Reconciling OpenSearchCluster","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch","namespace":"dedicated-logs-opensearch"},"namespace":"dedicated-logs-opensearch","name":"opensearch","reconcileID":"a1c16fc4-a62b-4b9f-9527-4e5c7ae791b9","cluster":{"name":"opensearch","namespace":"dedicated-logs-opensearch"}}
{"level":"info","ts":"2025-02-20T12:49:29.120Z","msg":"Group: data, Removed node opensearch-data-13","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch","namespace":"dedicated-logs-opensearch"},"namespace":"dedicated-logs-opensearch","name":"opensearch","reconcileID":"a1c16fc4-a62b-4b9f-9527-4e5c7ae791b9"}
{"level":"info","ts":"2025-02-20T12:49:29.259Z","msg":"Reconciling OpenSearchCluster","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch","namespace":"dedicated-logs-opensearch"},"namespace":"dedicated-logs-opensearch","name":"opensearch","reconcileID":"5c25b6bb-d333-4c4a-9a9c-e2ca10f1435e","cluster":{"name":"opensearch","namespace":"dedicated-logs-opensearch"}}
{"level":"info","ts":"2025-02-20T12:49:44.805Z","msg":"Group: data, Excluded node: opensearch-data-12","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch","namespace":"dedicated-logs-opensearch"},"namespace":"dedicated-logs-opensearch","name":"opensearch","reconcileID":"5c25b6bb-d333-4c4a-9a9c-e2ca10f1435e"}
{"level":"info","ts":"2025-02-20T12:49:44.872Z","msg":"Reconciling OpenSearchCluster","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch","namespace":"dedicated-logs-opensearch"},"namespace":"dedicated-logs-opensearch","name":"opensearch","reconcileID":"170ac32b-d312-49ec-a630-55074602f047","cluster":{"name":"opensearch","namespace":"dedicated-logs-opensearch"}}
{"level":"info","ts":"2025-02-20T12:49:45.423Z","msg":"Group: data, Waiting for node opensearch-data-12 to drain","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch","namespace":"dedicated-logs-opensearch"},"namespace":"dedicated-logs-opensearch","name":"opensearch","reconcileID":"170ac32b-d312-49ec-a630-55074602f047"}
Metadata
Metadata
Assignees
Labels
Type
Projects
Status