Skip to content

fix(core): use index-based tracking in BatchDeleter to fix progress stalls#7401

Open
TennyZhuang wants to merge 3 commits intomainfrom
fix/batch-deleter-progress-tracking
Open

fix(core): use index-based tracking in BatchDeleter to fix progress stalls#7401
TennyZhuang wants to merge 3 commits intomainfrom
fix/batch-deleter-progress-tracking

Conversation

@TennyZhuang
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

N/A (bug found through code analysis and deterministic testing)

Rationale

BatchDeleter previously used a HashSet<(String, OpDelete)> to track pending deletions and relied on OpDelete equality (which derives Hash/Eq over both version and recursive fields) to remove completed items via buffer.remove().

However, services like S3 and OSS reconstruct OpDelete from XML response data without preserving all original fields — for example, recursive always defaults to false in the reconstructed OpDelete. This causes buffer.remove() to silently fail when the original OpDelete had recursive: true, leaving items permanently stuck in the buffer and eventually triggering the "no progress" error in close().

Changes

  • Changed BatchDeleteResult from (String, OpDelete) tuples to index-based tracking (Vec<usize> for succeeded, Vec<(usize, Error)> for failed), referencing positions in the input batch Vec
  • Changed BatchDeleter.buffer from HashSet to Vec, removing the dependency on OpDelete Hash/Eq
  • Updated all 8 service delete_batch implementations (S3, OSS, GCS, Azblob, Swift, HF, Cloudflare-KV, object_store) to report results by index
  • Added regression tests verifying index-based progress tracking and partial failure retry

Are there any user-facing changes?

No. BatchDeleteResult is an internal type in raw::oio — not part of the public user API.


This PR was generated with the assistance of an LLM (Claude Opus 4.6) as a coding tool.

🤖 Generated with Claude Code

…talls

BatchDeleter previously used a HashSet<(String, OpDelete)> to track
pending deletions and relied on OpDelete equality to remove completed
items. However, services like S3 and OSS reconstruct OpDelete from
response data without preserving all fields (e.g., `recursive`),
causing buffer.remove() to silently fail. This left items permanently
stuck in the buffer, eventually triggering a "no progress" error.

Switch to index-based tracking: BatchDeleteResult now reports success
and failure by index into the input batch Vec, and BatchDeleter uses
a Vec instead of HashSet. This eliminates the dependency on OpDelete
equality entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. releases-note/fix The PR fixes a bug or has a title that begins with "fix" labels Apr 17, 2026
TennyZhuang and others added 2 commits April 17, 2026 16:19
…entries

Address two review findings:

1. flush_buffer() now restores items to self.buffer when delete_batch()
   returns Err, preventing data loss on transport-level failures.

2. S3 and OSS deleters now use HashMap<key, Vec<usize>> instead of
   HashMap<key, usize> to correctly handle duplicate entries in the
   same batch without collapsing their indices.

Added a regression test for the transport error buffer restoration path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rror

The single-item fast path in flush_buffer() removed the item from the
buffer before calling delete_once(). If delete_once() returned an error,
the item was permanently lost and could not be retried. Now the item is
cloned and only cleared from the buffer after a successful delete.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@dentiny dentiny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It reads similar to my previous discarded PR: #7325

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

releases-note/fix The PR fixes a bug or has a title that begins with "fix" size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants