Skip to content

[Enhancement] Reduce concurrent conflicts between block write operations and poll operations #1550

Open
@CLFutureX

Description

@CLFutureX

Background:
Currently, there exists intense concurrent competition between the write and poll operations of the block, which affects the write performance of the Write-Ahead Logging (WAL).
path: com.automq.stream.s3.wal.impl.block.SlidingWindowService

Current Status:
During the current block write process,

  1. A block is first acquired, and any fully written blocks are added to pendingBlocks.
  2. Subsequently, an attempt is made to poll a ready block and hand it over to the IO thread pool for data writing. This operation has a cool-down time set to a default of 1/3000 seconds.
  3. Additionally, there is a separate single-threaded scheduled thread pool with a corresponding scheduled task performing the same operation as described in step 2, but with a default interval of 1/1000 seconds.
  4. IO threads complete the writing of blocks and update writeBlock.

Conflict Points:

  1. Since both Step 1 ,Step 2 and Step 4 acquire the same lock, blockLock, it inevitably leads to conflicts between Step 1 , Step 2 and Step4 regardless of the circumstances.
  2. Additionally, due to the fact that each block write operation proactively attempts to execute Step 2, this creates conflicts between the current write thread and the scheduled thread's operations. This, in turn, further intensifies the conflict between write operations and poll operations.

Solution:
Optimization Approach: The write and poll operations should be separated to minimize concurrent conflicts.

1. Lock Separation Optimization

To separate the conflicts between writing blocks and polling blocks, lock separation can be implemented. For the polling operation, a separate lock, pollBlockLock, can be set up.

2. Shared Resource Handling:

After implementing lock separation, the next challenge is managing shared resources.

  • pendingBlocks: Both writing and polling involve modifications to pendingBlocks. Therefore, pendingBlocks should be implemented as a thread-safe queue, such as LinkedBlockingQueue.

  • currentBlock: Currently, both writing and polling involve accessing the current block, leading to inevitable conflicts between the two processes.
    To optimize this, a batching time can be introduced, which can be set to the current minWriteIntervalNanos. This way, during polling, a decision can be made based on time whether to include the currentBlock in the poll. If needed, an attempt to acquire blockLock is made, potentially causing a conflict; otherwise, no conflict arises.

  • writeBlocks: Currently, the primary role of writeBlocks is to update the startOffset of WindowCoreData. For
    writeBlocks, it is crucial to ensure the orderliness of the internal blocks.
    The current blockLock + pollBlockLock mechanism ensures the ordering of blocks within writeBlocks. As a preliminary solution, converting writeBlocks into a blocking queue seems feasible.
    When writeBlocks is not empty, the ordering can indeed be guaranteed. However, when writeBlocks is empty, how can we obtain the minimum startOffset currently written to writeBlock (or, equivalently, the maximum offset of the already written blocks)?
    Previously, due to the global blockLock, when writeBlocks was empty, we could simply retrieve the information from currentBlock.
    Currently, without the global lock, the preliminary solution involves acquiring blockLock + pollBlockLock when writeBlocks is empty.
    However, it is evident that this will introduce concurrency issues between step 4 and step 1 and 2 whenever writeBlocks is empty.

    How can we optimize this situation?
    There are two approaches: eventual consistency and strict consistency.
    eventual consistency
    When wroteBlocks is empty, we can directly calculate the offset based on the offset of the currently wroteBlock:
    offset = wroteBlocks.startOffset() + WALUtil.alignLargeByBlockSize(wroteBlocks.blockBatchSize()) to update the
    position accordingly.
    If the current block is indeed the one with the largest startOffset, then updating the offset in this way poses no
    issue.
    However, if that's not the case, for example, if block4 and block5 have already been written before, and now
    block3 is being written (in an out-of-order manner), after block3 is written, writingBlock becomes empty, and at this
    point, WindowCoreData's startOffset might be incorrectly updated to offset1. The correct update should be to
    offset2.
    image
    When would the update to the latest offset occur to maintain consistency?
    The update to the latest offset will occur when the next IO operation writes a new block, and the update is
    performed again at that time. This ensures that the startOffset reflects the most recent and accurate state of the
    written blocks.
    image
    strict consistency
    In the poll operation, keep track of the maximum offset that has been written to the block, denoted as
    maxWriteOffset. When updating, if writeBlocks is empty, it indicates that the current block has been fully written.
    Therefore, the offset can be set to maxWriteOffset.

3. Locking Optimization:

Consider changing the current blocking lock acquisition to a try-lock mechanism. If the write thread successfully acquires the lock, it proceeds with its operation. If the write thread fails to acquire the lock, it means that the poll thread is currently processing, and the write thread can simply return without further action.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions