[Enhancement] Make Merge in nativeEngine can Abort #2529

luyuncheng · 2025-02-14T14:26:55Z

Description

When there is a scenarios:

There is A Merge Task On Node1#Index1#Shard1(long time running)
After merge task started, begin relocating from Node1#Index1#Shard1 TO Node2#Index1#Shard1
At the finalize step, source need do closeShard, but the merge task would take a long time, stack as following shows.
The clusterApplierService would wait for about N minutes(long time running), and mark the node stale, and master let node1 left because node1 long time no response.

opensearch[datanode1][clusterApplierService#updateTask][T#1]" #41 daemon prio=5 os_prio=0 cpu=5183.70ms elapsed=93132.85s tid=0x00007f3f392509d0 nid=0x101 in Object.wait()  [0x00007f3f6ddfb000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait([email protected]/Native Method)
	- waiting on <no object reference available>
	at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:5410)
	- locked <0x0000001022b0abe8> (a org.apache.lucene.index.IndexWriter)
	at org.apache.lucene.index.IndexWriter.abortMerges(IndexWriter.java:2721)
	- locked <0x0000001022b0abe8> (a org.apache.lucene.index.IndexWriter)
	at org.apache.lucene.index.IndexWriter.rollbackInternalNoCommit(IndexWriter.java:2469)
	- locked <0x0000001022b0abe8> (a org.apache.lucene.index.IndexWriter)
	at org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2449)
	- locked <0x0000001022bae6d0> (a java.lang.Object)
	at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2441)
	at org.opensearch.index.engine.InternalEngine.closeNoLock(InternalEngine.java:2370)
	at org.opensearch.index.engine.Engine.close(Engine.java:2000)
	at org.opensearch.index.engine.Engine.flushAndClose(Engine.java:1987)
	at org.opensearch.index.shard.IndexShard.close(IndexShard.java:1907)
	- locked <0x0000001022b07ea0> (a java.lang.Object)
	at org.opensearch.index.IndexService.closeShard(IndexService.java:623)
	at org.opensearch.index.IndexService.removeShard(IndexService.java:599)
	- locked <0x0000001022a976a8> (a org.opensearch.index.IndexService)
	at org.opensearch.index.IndexService.close(IndexService.java:374)
	- locked <0x0000001022a976a8> (a org.opensearch.index.IndexService)
	at org.opensearch.indices.IndicesService.removeIndex(IndicesService.java:993)
	at org.opensearch.indices.cluster.IndicesClusterStateService.removeIndices(IndicesClusterStateService.java:446)
	at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:287)
	- locked <0x000000100b7da520> (a org.opensearch.indices.cluster.IndicesClusterStateService)
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:606)
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:593)

Proposal

i think we can introduce abort mechanism for long time merge task meanwhile close shard called.

i think we can introduce KNNMergeHelper class to check if merge aborted. and when build the graph, we can reuse faiss::InterruptCallback which is interrupt callback mechanism to check whether aborted or not

BUT ConcurrentMergeScheduler#MergeThread is a internal class, we can not call this directly. it throws org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread is in unnamed module of loader 'app'

we can added this static method into OpenSearch Core like OneMergeHelper

Related Issues

Resolves #[Issue number to be closed when this PR is merged]
#2530

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

jmazanec15 · 2025-02-14T18:01:00Z

@luyuncheng This is interesting - but can the interrupt callback with faiss be per graph or would it be for all graphs? In other words, would it cancel all graph builds happening on the node instead of just the one for the closed shard.

navneet1v · 2025-02-14T18:22:41Z

@luyuncheng thanks for creating the GH issue. and I think we ourselves have seen this problem in couple of places(ref: opensearch-project/OpenSearch#14828, @kotwanikunal created this), and seeing a solution around this problem is really great. I looked through the code and I want to know how this code is even working? Because as per my understanding of the code you are checking if merge is aborted if somehow write fails and then eating up that exception.

Is this PR only to see if the merge is aborted and write to directory fails how we can handle the errors in case shard is not present on the node because it moved? Because if that is the case then in the 2.19 version of k-NN plugin we added the support writing index using IndexInput/Output. So if a rather than checking if merge is aborted or not, can we not see if IndexInput/Output is closed or not?

luyuncheng · 2025-02-17T02:47:43Z

This is interesting - but can the interrupt callback with faiss be per graph or would it be for all graphs? In other words, would it cancel all graph builds happening on the node instead of just the one for the closed shard.

@jmazanec15 In Sample Code,

k-NN/src/main/java/org/apache/lucene/index/KNNMergeHelper.java

Lines 11 to 16 in 32e151a

    
           public static boolean isMergeAborted() { 
        
               Thread mergeThread = Thread.currentThread(); 
        
               if (mergeThread instanceof ConcurrentMergeScheduler.MergeThread) { 
        
                   return ((ConcurrentMergeScheduler.MergeThread) mergeThread).merge.isAborted(); 
        
               } 
        
               return false;

which need include in OpenSearch repo, it shows that it only checked for the current merge thread like OS code https://github.com/opensearch-project/OpenSearch/blob/99a9a81da366173b0c2b963b26ea92e15ef34547/server/src/main/java/org/apache/lucene/index/OneMergeHelper.java#L65-L69

luyuncheng · 2025-02-17T03:00:02Z

I looked through the code and I want to know how this code is even working? Because as per my understanding of the code you are checking if merge is aborted if somehow write fails and then eating up that exception.

@navneet1v i think the call chain is like following shows

When faiss is building the graph, meanwhile, close shard triggered.
Lucene would do IndexWriter#rollbackInternal which would not commit current segment, and wait for current process end.
faiss checked abort using InterruptCallback, and throw FaissException
we need catch FaissException from native code, and throw MergeAbortedException to Lucene

we added the support writing index using IndexInput/Output. So if a rather than checking if merge is aborted or not, can we not see if IndexInput/Output is closed or not

Nice catch, we need to handle FaissException and do close indexinput/output manually, then throw MergeAbortedException to lucene

jmazanec15 · 2025-02-17T15:26:34Z

@luyuncheng right, but in the faiss InterruptCallback: https://github.com/facebookresearch/faiss/blob/657c563604c774461aed0394ae99210713145e03/faiss/impl/AuxIndexStructures.h#L135-L162, it operates globally. So, setting the instance in one thread, will set it in another thread.

So, for instance, here is the interrupt check in HNSW during graph build: https://github.com/facebookresearch/faiss/blob/657c563604c774461aed0394ae99210713145e03/faiss/IndexHNSW.cpp#L178-L180. If the interrupt gets set and another shard is building a graph on the node, wont this shard's graph build fail too?

luyuncheng · 2025-02-17T15:44:29Z

it operates globally. So, setting the instance in one thread, will set it in another thread.

@jmazanec15 exactly right, AuxIndexStructures is an singleton struct.

So, for instance, here is the interrupt check in HNSW during graph build: https://github.com/facebookresearch/faiss/blob/657c563604c774461aed0394ae99210713145e03/faiss/IndexHNSW.cpp#L178-L180. If the interrupt gets set and another shard is building a graph on the node, wont this shard's graph build fail too?

every different thread call want_interrupt have different return because current thread context is different because we override want_interrupt. also https://github.com/facebookresearch/faiss/blob/657c563604c774461aed0394ae99210713145e03/faiss/IndexHNSW.cpp#L139 interrupt is a local variable, so different thread would not throw exception

so every graph build would go into check if (InterruptCallback::is_interrupted()) but only the current merge thread which aborted would return true.

also, multi thread would get into lock area which would reduce graph build performance, but check_period would help reduce the impacts.

jmazanec15 · 2025-02-17T17:06:41Z

Oh I see (https://github.com/facebookresearch/faiss/blob/657c563604c774461aed0394ae99210713145e03/faiss/impl/AuxIndexStructures.cpp#L223-L229). That makes sense.

Let me take look at this PR. This would also help address: opensearch-project/OpenSearch#8590.

jmazanec15 · 2025-02-18T16:56:37Z

src/main/java/org/opensearch/knn/jni/FaissService.java

        int[] parentIds
    );
+
+    public static native void setMergeInterruptCallback();


Can we remove these and just have one interrupt callback registered at the time of library initialization? I dont think these need to be in the interface.

I just prefer not to do this when its a global, static callback.

@jmazanec15 how about put it into FaissService#initLibrary with global init

src/main/java/org/apache/lucene/index/KNNMergeHelper.java

src/main/java/org/opensearch/knn/index/codec/nativeindex/NativeIndexWriter.java

jmazanec15 · 2025-06-12T18:00:41Z

@luyuncheng are you still working on this?

luyuncheng · 2025-06-16T04:31:18Z

@luyuncheng are you still working on this?

Yes, Sorry for the late. i am trying to find a way bypass the KNNMergeHelper for protect calling into OpenSearch. and trying to find a way for testing

Signed-off-by: luyuncheng <[email protected]>

luyuncheng · 2025-06-19T13:22:26Z

@jmazanec15 i found a way using reflect, it would get Accessable into ConcurrentMergeScheduler.MergeThread.class like following code:

k-NN/src/main/java/org/apache/lucene/index/KNNMergeHelper.java

Lines 15 to 32 in 952d3fb

    
           public static boolean isMergeAborted() { 
        
               Thread mergeThread = Thread.currentThread(); 
        
               if (mergeThread instanceof ConcurrentMergeScheduler.MergeThread) { 
        
                   // return ((ConcurrentMergeScheduler.MergeThread) mergeThread).merge.isAborted(); 
        
                   try { 
        
                       Object mergeObject = LucenePackagePrivateCaller.callPrivateFieldWithMethod( 
        
                           ConcurrentMergeScheduler.MergeThread.class, 
        
                           "merge", 
        
                           "isAborted", 
        
                           mergeThread 
        
                       ); 
        
                       return ((Boolean) mergeObject).booleanValue(); 
        
                   } catch (RuntimeException e) { 
        
                       return false; 
        
                   } 
        
               } 
        
               return false; 
        
           }

and call into reflect as following which using to get org.apache.lucene.index.MergePolicy.OneMerge in org.apache.lucene.index.ConcurrentMergeScheduler.MergeThread

k-NN/src/main/java/org/apache/lucene/index/LucenePackagePrivateCaller.java

Lines 16 to 29 in 952d3fb

    
           public class LucenePackagePrivateCaller { 
        
               public static Object callPrivateFieldWithMethod(Class<?> clz, String fieldName, String methodName, Object called) { 
        
                   return AccessController.doPrivileged((PrivilegedAction<Object>) () -> { 
        
                       try { 
        
                           Field field = clz.getDeclaredField(fieldName); 
        
                           field.setAccessible(true); 
        
                           return callMethod(field.getType(), methodName, null, field.get(called), null); 
        
                       } catch (Exception e) { 
        
                           log.error("callPrivateFieldWithMethod", e); 
        
                           throw new RuntimeException(e); 
        
                       } 
        
                   }); 
        
               }

so we can test it without modified OpenSearch Core.
Also i added tests in KNN80DocValuesConsumerTests.java

luyuncheng requested review from 0ctopus13prime, VijayanB, Vikasht34, heemin32, jmazanec15, junqiu-lei, martin-gaievski, naveentatikonda, navneet1v, ryanbogan, shatejas and vamshin as code owners February 14, 2025 14:26

luyuncheng mentioned this pull request Feb 14, 2025

[Feature] Abort Merge in nativeEngine #2530

Open

jmazanec15 mentioned this pull request Feb 17, 2025

IndicesClusterStateService blocking ClusterApplier thread and causing node drop. opensearch-project/OpenSearch#8590

Open

jmazanec15 reviewed Feb 18, 2025

View reviewed changes

ADD Merge Abort Caller

262782c

Signed-off-by: luyuncheng <[email protected]>

luyuncheng force-pushed the AbortableMerge branch from 1f859f8 to 262782c Compare June 18, 2025 12:52

luyuncheng added 4 commits June 18, 2025 21:54

FIX EXCEPTION PATH

9b38fc5

Signed-off-by: luyuncheng <[email protected]>

FIX EXCEPTION PATH

eb431db

Signed-off-by: luyuncheng <[email protected]>

FIX EXCEPTION PATH

7e57bbe

Signed-off-by: luyuncheng <[email protected]>

FIX TESTS

8cdfe1d

Signed-off-by: luyuncheng <[email protected]>

luyuncheng added 3 commits June 19, 2025 15:32

ADD TESTS

d808ead

Signed-off-by: luyuncheng <[email protected]>

FXIED Tests

77838cf

Signed-off-by: luyuncheng <[email protected]>

Add Changelog AND SpotlessApply

952d3fb

Signed-off-by: luyuncheng <[email protected]>

luyuncheng requested a review from jmazanec15 June 23, 2025 07:49

[Enhancement] Make Merge in nativeEngine can Abort #2529

Are you sure you want to change the base?

[Enhancement] Make Merge in nativeEngine can Abort #2529

Conversation

luyuncheng commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Proposal

Related Issues

Check List

Uh oh!

jmazanec15 commented Feb 14, 2025

Uh oh!

navneet1v commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luyuncheng commented Feb 17, 2025

Uh oh!

luyuncheng commented Feb 17, 2025

Uh oh!

jmazanec15 commented Feb 17, 2025

Uh oh!

luyuncheng commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmazanec15 commented Feb 17, 2025

Uh oh!

jmazanec15 Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

jmazanec15 Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

luyuncheng Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jmazanec15 commented Jun 12, 2025

Uh oh!

luyuncheng commented Jun 16, 2025

Uh oh!

luyuncheng commented Jun 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

luyuncheng commented Feb 14, 2025 •

edited

Loading

navneet1v commented Feb 14, 2025 •

edited

Loading

luyuncheng commented Feb 17, 2025 •

edited

Loading

luyuncheng Jun 19, 2025 •

edited

Loading