[Fix](batch) Prevent writer deadlock from currentCacheBytes drift by addu390 · Pull Request #653 · apache/doris-flink-connector

addu390 · 2026-05-02T20:36:40Z

Proposed changes

Issue Number: close #614

Problem Summary:

DorisBatchStreamLoad increments currentCacheBytes by the client-side record bytes on insert, but decrements it by respContent.getLoadBytes() on a successful load. Whenever the BE-reported value is smaller than what the client buffered, either by partial_columns=true, compress_type=gz, etc, each load leaks a few bytes from the counter.

Over time the leak accumulates above maxBlockedBytes, so writeRecord parks on block.await() forever even though bufferMap and flushQueue are empty. The job freezes with no exception, only repeating Cache full, waiting for flush and bufferMap is empty, no need to flush null logs.

Two changes:

Decrement currentCacheBytes by buffer.getBufferSizeBytes() so the add and subtract are symmetric regardless of compression / projection.
Move the per-buffer flush check above the global cache-pressure await loop so a buffer that just crossed bufferFlushMaxBytes actually gets flushed instead of being stranded behind backpressure.

Checklist(Required)

Does it affect the original behavior: No
Has unit tests been added: No Need
Has document been added or modified: No Need
Does it need to update dependencies: No
Are there any changes that cannot be rolled back: No

Further comments

Same root cause was independently reported in #614 (gz compression trigger). The fix is config-agnostic, it covers partial_columns, gz, and any future source of client-vs-BE byte asymmetry.

Copilot

Pull request overview

This PR fixes a batch sink stall in DorisBatchStreamLoad by making cache-byte accounting symmetric with what the client actually buffered and by triggering per-buffer flushes before entering global backpressure waits. In the Flink Doris connector, these changes target the writer path that can otherwise freeze when currentCacheBytes drifts upward over time.

Changes:

Flush full buffers before waiting on global cache pressure so newly full buffers are not stranded behind backpressure.
Decrement currentCacheBytes using the buffered byte count instead of Doris-reported loadBytes.
Keep the fix local to the batch stream-load implementation without changing public APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        if (flushQueue.size() < executionOptions.getFlushQueueSize()
+                && (buffer.getBufferSizeBytes() >= executionOptions.getBufferFlushMaxBytes()
+                        || buffer.getNumOfRecords() >= executionOptions.getBufferFlushMaxRows())) {
+            boolean flush = bufferFullFlush(bufferKey);
+            LOG.info("trigger flush by buffer full, flush: {}", flush);
+        } else if (buffer.getBufferSizeBytes() >= STREAM_LOAD_MAX_BYTES
+                || buffer.getNumOfRecords() >= STREAM_LOAD_MAX_ROWS) {
+            // The buffer capacity exceeds the stream load limit, flush
+            boolean flush = bufferFullFlush(bufferKey);
+            LOG.info("trigger flush by buffer exceeding the limit, flush: {}", flush);


                                long cacheByteBeforeFlush =
-                                        currentCacheBytes.getAndAdd(-respContent.getLoadBytes());
+                                        currentCacheBytes.getAndAdd(-buffer.getBufferSizeBytes());


JNSimba

LGTM, Thank you for your contribution.

addu390 · 2026-05-06T13:48:04Z

@JNSimba Thanks for the review. When would the next release cutoff be?

JNSimba · 2026-05-07T01:58:20Z

@JNSimba Thanks for the review. When would the next release cutoff be?

Yes, a fix will be released quickly, and a vote is expected to be launched within the next two days.

@addu390

## Versions - [x] dev - [x] 4.x - [ ] 3.x - [ ] 2.1 or older (not covered by version/language sync gate) ## Languages - [x] Chinese - [x] English - [ ] Japanese candidate translation needed ## Docs Checklist - [ ] Checked by AI - [ ] Test Cases Built - [x] Updated required version and language counterparts, or explained why not - [x] If only one language changed, confirmed whether source/translation counterparts need sync ## Summary Release Flink Doris Connector 26.1.1, superseding 26.1.0. - Version table in `flink-doris-connector.md` (dev + 4.x, EN + zh-CN): replace `26.1.0` row with `26.1.1`. - `release-notes.md` (dev + 4.x, EN + zh-CN): prepend a `26.1.1` section. - Bug fix: batch sink potentially freezing during prolonged operation when compression is enabled (apache/doris-flink-connector#653). - Credits: @addu390 - Download page (`src/constant/download.data.ts`): replace 26.1.0 entry (label/value/source/binary URLs and the `FLINK_SAME_SOURCE_2610` constant) with 26.1.1. Release notes reference: apache/doris-flink-connector#654

[Fix](batch) prevent writer deadlock from currentCacheBytes drift

cc169fb

JNSimba requested a review from Copilot May 6, 2026 02:36

Copilot started reviewing on behalf of JNSimba May 6, 2026 02:36 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

JNSimba approved these changes May 6, 2026

View reviewed changes

JNSimba merged commit 0044826 into apache:master May 6, 2026
13 checks passed

This was referenced May 7, 2026

Release Note 26.1.1 #654

Open

[doc] release Flink Doris Connector 26.1.1 apache/doris-website#3631

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix](batch) Prevent writer deadlock from currentCacheBytes drift#653

[Fix](batch) Prevent writer deadlock from currentCacheBytes drift#653
JNSimba merged 1 commit into
apache:masterfrom
addu390:fix/batch-writer-deadlock-cachebytes-drift

addu390 commented May 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

JNSimba left a comment

Uh oh!

Uh oh!

addu390 commented May 6, 2026

Uh oh!

JNSimba commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

addu390 commented May 2, 2026

Proposed changes

Problem Summary:

Checklist(Required)

Further comments

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

JNSimba left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

addu390 commented May 6, 2026

Uh oh!

JNSimba commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants