Skip to content

fix(core): retain buffered data in WriteGenerator on write failure#7400

Open
TennyZhuang wants to merge 2 commits intomainfrom
fix/write-generator-data-loss
Open

fix(core): retain buffered data in WriteGenerator on write failure#7400
TennyZhuang wants to merge 2 commits intomainfrom
fix/write-generator-data-loss

Conversation

@TennyZhuang
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

N/A - discovered via deterministic testing of error-handling paths.

Rationale

WriteGenerator's write() and close() methods called buffer.take() before attempting to write to the underlying writer. If the write failed, the buffered data was permanently lost, making retry impossible without data loss.

This violated the invariant maintained by other writer implementations (AppendWriter, BlockWriter, MultipartWriter, PositionWriter), which only clear state after a successful write.

Changes

  • Inexact mode write path: Clone the taken buffer before writing; restore it on failure.
  • Exact mode write path: Same clone-and-restore pattern.
  • close() method: Same clone-and-restore pattern in the flush loop.
  • Added 3 deterministic tests: FailOnceMockWriter that fails the first write call, then succeeds — verifying no data loss in inexact mode, exact mode, and close() paths.

Are there any user-facing changes?

Users who rely on retry (e.g., via RetryLayer) will no longer silently lose data when the underlying storage temporarily fails during a write flush.

This PR was generated with the assistance of AI (Claude).

WriteGenerator's write() and close() methods called buffer.take() before
attempting to write to the underlying writer. If the write failed, the
buffered data was lost, making retry impossible without data loss.

Fix by cloning the taken buffer before writing, and restoring it on failure.
This matches the invariant maintained by other writer implementations
(AppendWriter, BlockWriter, MultipartWriter, PositionWriter).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@TennyZhuang TennyZhuang requested a review from Xuanwo as a code owner April 17, 2026 07:38
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. releases-note/fix The PR fixes a bug or has a title that begins with "fix" labels Apr 17, 2026
… duplication

On write failure in inexact mode, only restore the previously buffered data
(not the current bs). The caller will retry with the same bs, so including
it in the restored buffer would cause data duplication.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@dentiny dentiny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask some dumb questions?

  • It seems to me that the "issue" reproduces if users call write again after a failed write. In production, why would users do that?
  • In terms of retry layer (which calls write repeatedly after failure), it seems to sit below the write generator. I'm curious if we could provide a reproducible example to show data loss with retry layer used together?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

releases-note/fix The PR fixes a bug or has a title that begins with "fix" size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants