Skip to content

Optimise Logging Cluster State Health Changes #14647

@Bukhtawar

Description

@Bukhtawar

Is your feature request related to a problem? Please describe

For a reasonably big cluster with 500k shards, logging cluster health changes becomes expensive after every reroute operation.

96.7% (9.6s out of 10s) cpu usage by thread 'opensearch[74e0b23bcf51c21918e96f38f93e1491][clusterManagerService#updateTask][T#1]'
     2/10 snapshots sharing following 22 elements
       [email protected]/java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1053)
       app//org.opensearch.cluster.routing.RoutingTable.allShards(RoutingTable.java:245)
       app//org.opensearch.cluster.routing.RoutingTable.allShards(RoutingTable.java:225)
       app//org.opensearch.cluster.health.ClusterStateHealth.<init>(ClusterStateHealth.java:138)
       app//org.opensearch.cluster.health.ClusterStateHealth.<init>(ClusterStateHealth.java:77)
       app//org.opensearch.cluster.routing.allocation.AllocationService.buildResultAndLogHealthChange(AllocationService.java:186)
       app//org.opensearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:528)
       app//org.opensearch.node.Node$$Lambda$2602/0x0000004000a253b8.apply(Unknown Source)
       app//org.opensearch.cluster.routing.BatchedRerouteService$1.execute(BatchedRerouteService.java:136)
       app//org.opensearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:67)
       app//org.opensearch.cluster.service.MasterService.executeTasks(MasterService.java:882)
       app//org.opensearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:434)
       app//org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:301)
       app//org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:212)
       app//org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:209)
       app//org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:247)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:863)
       app//org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283)
       app//org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246)
       [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       [email protected]/java.lang.Thread.run(Thread.java:840)

Describe the solution you'd like

All we care about is the status RED/YELLOW which can be derived using just the unassigned shards

Related component

ShardManagement:Performance

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

PerformanceThis is for any performance related enhancements or bugsShardManagement:PerformanceenhancementEnhancement or improvement to existing feature or request

Type

No type

Projects

Status

🆕 New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions