Skip to content

[ENH](wal3): add optional upload fault injector#6849

Open
rescrv wants to merge 1 commit intorescrv/fault-injectionfrom
rescrv/wire-up-faults
Open

[ENH](wal3): add optional upload fault injector#6849
rescrv wants to merge 1 commit intorescrv/fault-injectionfrom
rescrv/wire-up-faults

Conversation

@rescrv
Copy link
Copy Markdown
Contributor

@rescrv rescrv commented Apr 8, 2026

Description of changes

Add a generic fragment-manager shim in wal3 that can wrap
fragment uploaders with an optional fault injector.

Wire chroma-log-service's FaultRegistry through the S3,
repl, and copy/open paths so fragment uploads can be
delayed or failed via the wal3.fragment_upload label.

Keep the injector optional for callers that do not want
fault injection, and add focused wal3 and log-service
coverage for the new path.

Test plan

CI

Migration plan

N/A

Observability plan

This is about fault injection.

Documentation Changes

N/A

Co-authored-by: AI

@rescrv rescrv requested a review from sanketkedia April 8, 2026 20:07
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@propel-code-bot
Copy link
Copy Markdown
Contributor

propel-code-bot bot commented Apr 8, 2026

wal3 Optional Fragment Upload Fault Injection Integrated into log-service Paths

This PR introduces a new optional fault-injection layer for fragment uploads in wal3, centered around FaultInjectingFragmentManagerFactory, FragmentUploadFaultInjector, and the FRAGMENT_UPLOAD_FAULT_LABEL (wal3.fragment_upload). The design allows callers to wrap existing fragment manager factories so upload operations can be delayed or failed (Unavailable) immediately before upload_parquet, without hard-coupling wal3 to a specific fault registry implementation.

To support this, the FragmentManagerFactory trait was expanded with associated Uploader, make_fragment_uploader, and write_options, and S3/repl factory implementations were updated accordingly. In log-service, fault injection is wired through open/copy/fork/update/GC paths via an optional injector sourced from FaultRegistry when the new faults feature is enabled in rust/log-service/Cargo.toml. Tests were added in both wal3 and log-service to verify short-circuit behavior and recovery after faults are cleared.

This summary was automatically generated by @propel-code-bot

propel-code-bot[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@propel-code-bot propel-code-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is functionally solid, with minor recommendations to reduce duplicated factory logic and clarify public API visibility/docs for newly exposed fault-injection traits and types.

Status: Minor Suggestions | Risk: Low

Issues Identified & Suggestions
  • Duplicate consumer construction logic; extract shared builder helpers. rust/wal3/src/interfaces/s3/mod.rs, rust/wal3/src/interfaces/repl/mod.rs
  • Public #[doc(hidden)] APIs are confusing; make crate-private or document. rust/wal3/src/interfaces/mod.rs, rust/wal3/src/lib.rs
Review Details

📁 6 files reviewed | 💬 3 comments

👍 / 👎 individual comments to help improve reviews for you

Comment on lines +129 to +136
async fn make_consumer(&self) -> Result<Self::Consumer, Error> {
let storages = Arc::clone(&self.storages);
Ok(FragmentReader::new(
self.repl.clone(),
self.preferred,
storages,
Arc::clone(&self.read_repair_semaphore),
))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommended

[Maintainability] [CodeDuplication] The make_consumer implementation in FragmentManagerFactoryWithUploader for ReplicatedFragmentManagerFactory (lines 129–137) is identical to the make_consumer in FragmentManagerFactory for ReplicatedFragmentManagerFactory (lines ~108–114 in the after state): both call FragmentReader::new(self.repl.clone(), self.preferred, storages, Arc::clone(&self.read_repair_semaphore)). The same duplication exists in rust/wal3/src/interfaces/s3/mod.rs for S3FragmentPuller. Extract a build_fragment_reader helper (matching build_fragment_uploader) to eliminate this duplication and prevent drift.

Context for Agents
[CodeDuplication] The `make_consumer` implementation in `FragmentManagerFactoryWithUploader for ReplicatedFragmentManagerFactory` (lines 129–137) is identical to the `make_consumer` in `FragmentManagerFactory for ReplicatedFragmentManagerFactory` (lines ~108–114 in the after state): both call `FragmentReader::new(self.repl.clone(), self.preferred, storages, Arc::clone(&self.read_repair_semaphore))`. The same duplication exists in `rust/wal3/src/interfaces/s3/mod.rs` for `S3FragmentPuller`. Extract a `build_fragment_reader` helper (matching `build_fragment_uploader`) to eliminate this duplication and prevent drift.

File: rust/wal3/src/interfaces/repl/mod.rs
Line: 136

Copy link
Copy Markdown
Contributor

@propel-code-bot propel-code-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review found no issues; changes appear well-implemented with adequate test coverage.

Status: No Issues Found | Risk: Low

Review Details

📁 7 files reviewed | 💬 0 comments

@rescrv rescrv force-pushed the rescrv/wire-up-faults branch from 6ef50b1 to 605427f Compare April 10, 2026 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant