Skip to content

Conversation

@rescrv
Copy link
Contributor

@rescrv rescrv commented Dec 30, 2025

Description of changes

This PR introduces the wal3 replicated interface that works atop spanner.

Test plan

CI + additional testing

Migration plan

N/A

Observability plan

N/A

Documentation Changes

N/A

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@blacksmith-sh

This comment has been minimized.

@blacksmith-sh

This comment has been minimized.

Relocate the spanner-migrations crate from rust/rust-sysdb/spanner-migrations/
to rust/spanner-migrations/ to align with the flat structure of other crates
in the rust workspace. Update workspace members in Cargo.toml accordingly.

Co-authored-by: AI
…ation directories

Improve the spanner-migrations tool with proper CLI argument parsing and
support for managing migrations in multiple directories:

- Add clap for structured CLI with subcommands (generate-sum, apply, validate)
- Support --slug flag to filter migrations by directory (e.g., spanner_sysdb, spanner_logdb)
- Support --root flag for specifying output directory when generating manifests
- Add SpannerLogDb migration directory alongside existing SpannerSysDb
- Rename as_str() to migration_slug() for clarity
- Add manifest_filename() and folder_name() helper methods to MigrationDir
- Update error messages to reference the correct manifest filename
- Fix default config path to reference ../worker/chroma_config2.yaml

Co-authored-by: AI
Implement a quorum-based coordination mechanism that:
- Runs futures in parallel and waits for a minimum count of Ok results
- Starts a timeout after reaching the quorum threshold
- Cancels remaining futures that exceed the timeout
- Returns results in original order with None for cancelled futures

This enables handling partial quorum failures where some writers may
be slow or unresponsive, allowing the system to proceed once a quorum
of successful writes is achieved while still attempting to maximize
replication within a bounded time window.

Co-authored-by: AI
@rescrv rescrv force-pushed the rescrv/replicated-wal3 branch from 41a0d76 to beb1a12 Compare December 31, 2025 01:08
rescrv added 6 commits January 2, 2026 11:43
Move fragment reading methods (read_raw_bytes, read_parquet, read_fragment)
into the FragmentConsumer trait, eliminating the need for LogReader to hold
a Storage reference. This improves encapsulation and simplifies the LogReader
interface.

- Rename FragmentPuller to S3FragmentPuller for naming consistency
- Add read_raw_bytes, read_parquet, read_fragment to FragmentConsumer trait
- Move checksum_parquet utility from reader.rs to interfaces/mod.rs
- Remove storage parameter from LogReader::new and LogReader::open
- Remove unused _writer_name parameters from make_log_reader helpers

Co-authored-by: AI
@rescrv rescrv force-pushed the rescrv/replicated-wal3 branch from beb1a12 to c60d0b8 Compare January 2, 2026 20:46
@rescrv rescrv changed the base branch from rescrv/tilt to rescrv/fragment-reader January 2, 2026 20:47
@rescrv rescrv force-pushed the rescrv/replicated-wal3 branch from dfc7a69 to ff4b7ea Compare January 2, 2026 21:25
@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Jan 2, 2026

Found 3 test failures on Blacksmith runners:

Failures

Test View Logs
wal3/
interfaces::repl::fragment_manager::tests::compute_mask_after_decimation_interval_elaps
ed
View Logs
wal3/
interfaces::repl::fragment_manager::tests::compute_mask_before_decimation_interval_elap
sed
View Logs
wal3/interfaces::repl::manifest_manager::tests::test_k8s_integration_apply_garbage View Logs

Fix in Cursor

@rescrv rescrv force-pushed the rescrv/fragment-reader branch from 8174102 to 6758bfc Compare January 6, 2026 00:30
@rescrv rescrv requested a review from sanketkedia January 6, 2026 17:26
@rescrv rescrv marked this pull request as ready for review January 6, 2026 23:23
@propel-code-bot
Copy link
Contributor

Introduce Spanner-backed replicated WAL interface and refactor WAL3 infrastructure

This PR delivers a new replicated WAL3 implementation that stores manifests in Google Cloud Spanner and replicates fragments across multi-region object storage via a quorum-based writer. It reworks the WAL3 I/O pipeline, manifest management, and garbage-collection bookkeeping to accommodate the replicated design while keeping S3-based flows functional. Service integrations, configuration tooling, and Spanner migrations are updated to surface the new interfaces, default replication options, and multi-directory migration workflows.

Key Changes

• Implemented wal3/src/interfaces/repl with Spanner-backed ManifestManager, quorum S3 fragment publishers/readers, and topology-aware configuration for multi-region deployments.
• Refactored WAL3 S3 pipelines by introducing create_s3_factories, dedicated FragmentUploader/FragmentConsumer implementations, and witness-based manifest caching across log-service, GC, worker, compactor, and tests.
• Reworked garbage tracking to operate on FragmentSeqNo ranges with a fragments_are_uuids flag and updated serialization and GC workflows accordingly.
• Extended wal3::copy and log reader APIs to support cross-implementation copying and to return rich metadata through the new consumer abstractions.
• Added Spanner migration assets, a reorganized rust/spanner-migrations crate, and CLI tooling with multi-directory support for managing manifests.

Affected Areas

• rust/wal3 interfaces (S3 and new replicated implementations)
• rust/types topology and configuration helpers
• rust/spanner-migrations tooling and manifests
• Service layers consuming WAL3 (log-service, garbage collector, worker, compactor, heap tender)
• CI/Tilt profiles and integration tests tied to WAL3

This summary was automatically generated by @propel-code-bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants