fix(os-persistence): Resolve race condition in OpenSearch bulk indexing #600

Akhil-Pathivada · 2025-10-08T11:35:27Z

Problem

Issue #592 reports random task indexing failures where tasks are saved in the DB but fail to index in OpenSearch, causing inconsistency between the Primary DB and Search index.

Root Cause

A race condition exists in the OpenSearchRestDAO.indexObject() method. The critical section (add to buffer → check size → flush) is not atomic, allowing concurrent threads to interfere during the flush operation.

The issue occurs when:

Thread A adds an item to the buffer (size = 1)
Thread A checks size (1 >= threshold), begins flushing to OpenSearch
Thread B adds an item to the buffer (size = 2) while Thread A is sending data to OpenSearch
Thread A receives success from OpenSearch and replaces the entire buffer with a new empty one: bulkRequests.put(docType, new BulkRequests(...))
Thread B's item is discarded when the buffer is replaced
Thread B checks size on the new empty buffer (size = 0), doesn't flush
Result: Thread B's item is lost

This is a Time-Of-Check-Time-Of-Use (TOCTOU) race condition where the buffer state changes between when Thread B adds its item and when it performs the size check.

Why indexBatchSize=1 exacerbates this:
With the default indexBatchSize=1, every item triggers an immediate flush, maximizing the window for concurrent modifications during the OpenSearch network call.

Solution

Add a synchronized(this) block around the critical section in indexObject():

This ensures:

Only one thread can add to the buffer, check size, and trigger flush at a time
No thread can add items while another thread is flushing
Buffer replacement happens atomically without losing items from concurrent threads

Testing

Tested with concurrent workflow executions and verified 100% indexing success rate

Changes

Modified OpenSearchRestDAO.indexObject() method with synchronized block
No API changes or breaking changes

Fixes #592

Akhil-Pathivada · 2025-10-08T17:06:00Z

@v1r3n @manan164 could you please help reviewing this?

Akhil-Pathivada mentioned this pull request Oct 8, 2025

Bug: Silent Task Indexing Failures with Async Indexing Enabled #592

Open

fix(os-persistence): Resolve race condition in OpenSearch bulk indexing

f209a12

Akhil-Pathivada force-pushed the fix/issue-592-opensearch-indexing-race-condition branch from ee1209e to f209a12 Compare October 8, 2025 12:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(os-persistence): Resolve race condition in OpenSearch bulk indexing #600

fix(os-persistence): Resolve race condition in OpenSearch bulk indexing #600

Akhil-Pathivada commented Oct 8, 2025

Uh oh!

Akhil-Pathivada commented Oct 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix(os-persistence): Resolve race condition in OpenSearch bulk indexing #600

Are you sure you want to change the base?

fix(os-persistence): Resolve race condition in OpenSearch bulk indexing #600

Conversation

Akhil-Pathivada commented Oct 8, 2025

Problem

Root Cause

Solution

Testing

Changes

Uh oh!

Akhil-Pathivada commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Akhil-Pathivada commented Oct 8, 2025 •

edited

Loading