fix(os-persistence): Resolve race condition in OpenSearch bulk indexing #600
+10
−7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Issue #592 reports random task indexing failures where tasks are saved in the DB but fail to index in OpenSearch, causing inconsistency between the Primary DB and Search index.
Root Cause
A race condition exists in the
OpenSearchRestDAO.indexObject()
method. The critical section (add to buffer → check size → flush) is not atomic, allowing concurrent threads to interfere during the flush operation.The issue occurs when:
bulkRequests.put(docType, new BulkRequests(...))
This is a Time-Of-Check-Time-Of-Use (TOCTOU) race condition where the buffer state changes between when Thread B adds its item and when it performs the size check.
Why indexBatchSize=1 exacerbates this:
With the default
indexBatchSize=1
, every item triggers an immediate flush, maximizing the window for concurrent modifications during the OpenSearch network call.Solution
Add a
synchronized(this)
block around the critical section inindexObject()
:This ensures:
Testing
Changes
OpenSearchRestDAO.indexObject()
method with synchronized blockFixes #592