Skip to content

[Feature] Abort Merge in nativeEngine  #2530

@luyuncheng

Description

@luyuncheng

What is the bug?

Description

When there is a scenarios:

  1. There is A Merge Task On Node1#Index1#Shard1(long time running)
  2. After merge task started, begin relocating from Node1#Index1#Shard1 TO Node2#Index1#Shard1
  3. At the finalize step, source need do closeShard, but the merge task would take a long time, stack as following shows.
  4. The clusterApplierService would wait for about N minutes(long time running), and mark the node stale, and master let node1 left because node1 long time no response.
opensearch[datanode1][clusterApplierService#updateTask][T#1]" #41 daemon prio=5 os_prio=0 cpu=5183.70ms elapsed=93132.85s tid=0x00007f3f392509d0 nid=0x101 in Object.wait()  [0x00007f3f6ddfb000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait([email protected]/Native Method)
	- waiting on <no object reference available>
	at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:5410)
	- locked <0x0000001022b0abe8> (a org.apache.lucene.index.IndexWriter)
	at org.apache.lucene.index.IndexWriter.abortMerges(IndexWriter.java:2721)
	- locked <0x0000001022b0abe8> (a org.apache.lucene.index.IndexWriter)
	at org.apache.lucene.index.IndexWriter.rollbackInternalNoCommit(IndexWriter.java:2469)
	- locked <0x0000001022b0abe8> (a org.apache.lucene.index.IndexWriter)
	at org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2449)
	- locked <0x0000001022bae6d0> (a java.lang.Object)
	at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2441)
	at org.opensearch.index.engine.InternalEngine.closeNoLock(InternalEngine.java:2370)
	at org.opensearch.index.engine.Engine.close(Engine.java:2000)
	at org.opensearch.index.engine.Engine.flushAndClose(Engine.java:1987)
	at org.opensearch.index.shard.IndexShard.close(IndexShard.java:1907)
	- locked <0x0000001022b07ea0> (a java.lang.Object)
	at org.opensearch.index.IndexService.closeShard(IndexService.java:623)
	at org.opensearch.index.IndexService.removeShard(IndexService.java:599)
	- locked <0x0000001022a976a8> (a org.opensearch.index.IndexService)
	at org.opensearch.index.IndexService.close(IndexService.java:374)
	- locked <0x0000001022a976a8> (a org.opensearch.index.IndexService)
	at org.opensearch.indices.IndicesService.removeIndex(IndicesService.java:993)
	at org.opensearch.indices.cluster.IndicesClusterStateService.removeIndices(IndicesClusterStateService.java:446)
	at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:287)
	- locked <0x000000100b7da520> (a org.opensearch.indices.cluster.IndicesClusterStateService)
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:606)
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:593)

PR #2529

Metadata

Metadata

Assignees

Labels

FeaturesIntroduces a new unit of functionality that satisfies a requirement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions