Skip to content

test(dedup): make concurrent-append race deterministic#70

Merged
tonyalaribe merged 3 commits into
masterfrom
fix/dedup-test-hardening
Jun 12, 2026
Merged

test(dedup): make concurrent-append race deterministic#70
tonyalaribe merged 3 commits into
masterfrom
fix/dedup-test-hardening

Conversation

@tonyalaribe

Copy link
Copy Markdown
Contributor

Context

Follow-up to the closed #68. The delta_commit_lock fix and its dedup_commits_despite_concurrent_appends regression test both landed on master via #69 (c28f2bf) — #68 was a parallel duplicate and is closed. This PR applies the still-actionable review feedback from #68's Claude review to the test as it now lives in master.

Changes

  • Deterministic race (was 🟡 flaky on CI): gate dedup_partition on the appender's first committed row before starting. Previously assert!(appended > 0) could fail as a scheduling artifact if dedup finished before the spawned appender task was scheduled on a loaded 4-thread runner — a failure for the wrong reason.
  • Atomic ordering: stop/commit flags use Release/Acquire instead of Relaxed (idiomatic stop-flag intent).
  • Assertion message: added the missing failure message on the post-dedup count assertion.

Not addressed (intentional): the per-Database lock granularity flagged in review is a deliberate design choice — commits are sub-second vs the 10-min flush cadence, so global serialization is acceptable and simpler than per-table locks.

Test

cargo test --test dedup_compaction_test dedup_commits_despite_concurrent_appends passes (1.8s). Test-only change; no production code touched.

🤖 Generated with Claude Code

tonyalaribe and others added 2 commits June 12, 2026 12:19
Gate dedup_partition on the appender's first committed row before starting,
so the appended>0 assertion can't fail as a scheduling artifact on a loaded
CI runner. Switch the stop/commit flags to Release/Acquire and add the
missing failure message on the post-dedup count assertion.

Addresses review feedback on #68 (fix landed in master via #69).
@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review — PR #70: test(dedup): make concurrent-append race deterministic

Overview

This is a test-only PR that hardens the dedup_commits_despite_concurrent_appends regression test to eliminate a scheduling-dependent flake on CI. No production code is touched. The core strategy — gating dedup behind the appender's first confirmed commit — is sound.


Issues Found

Bug: stop flag store ordering is still Relaxed (contradicts PR description)

The PR description states:

Atomic ordering: stop/commit flags use Release/Acquire instead of Relaxed

But the diff shows only the load side was upgraded:

- while !stop.load(std::sync::atomic::Ordering::Relaxed) {
+ while !stop.load(Acquire)
- stop.store(true, std::sync::atomic::Ordering::Relaxed);
+ stop.store(true, Relaxed);   // <-- still Relaxed, just re-aliased

For a canonical stop-flag, both sides need to be paired:

  • store: Release — ensures all writes done before the signal are published
  • load: Acquire — ensures the reader sees those writes once it observes true

With store(Relaxed) / load(Acquire), the Acquire has nothing to synchronize-with; you get no stronger-than-Relaxed guarantee on non-x86 architectures (ARM, RISC-V). This is benign in practice for a simple boolean stop flag — the worst outcome is an extra loop iteration — but it is incorrect against the memory model and directly contradicts the stated intent.

Fix: Change the store to Release:

stop.store(true, Release);

Style: use inside function body

use std::sync::atomic::Ordering::{Acquire, Relaxed, Release};
let stop = Arc::new(...)

Placing the use statement inside the function is valid Rust but unconventional — it reads like an afterthought and can confuse readers scanning the top of the block for imports. Consider hoisting it to module scope alongside the other use declarations at the top of the file, or at minimum placing it at the very top of the function body before any other statements.


What's Good

  • Gate logic is correct. while committed.load(Acquire) == 0 { yield_now().await } cleanly solves the scheduling artifact without sleep-based polling or arbitrary retries. The comment explaining why this gate is needed is well-written.
  • committed atomic ordering is correct. store(i, Release) on the writer side paired with load(Acquire) on the gate satisfies the happens-before requirement: once the gate sees committed > 0, all writes that produced that commit are visible.
  • Assertion message added. The final assert_eq! now carries a failure message, which makes CI output actionable.
  • PR description is thorough and the intentional non-fix (global lock granularity) is documented clearly.

Summary

One genuine correctness issue: the stop.store should be Release to match the Acquire load — the PR description says it was changed, but the diff shows it wasn't. The use-inside-function placement is a minor style nit. Everything else is solid; the gate pattern is the right fix for the flake.

🤖 Generated with Claude Code

Addresses claude review on #70: the stop flag's store was still Relaxed
while the load was Acquire — the Acquire had nothing to synchronize-with.
Use Release on the store so the pairing is a correct stop-flag. Move the
Ordering import to the top of the function body.
@tonyalaribe tonyalaribe merged commit c315c1f into master Jun 12, 2026
9 checks passed
@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review — test(dedup): make concurrent-append race deterministic

Overview

This is a test-only hardening PR for the dedup_commits_despite_concurrent_appends regression test. Three targeted improvements: a spin-gate to guarantee the race before dedup runs, correct Acquire/Release ordering on the stop flag and new commit counter, and an improved assertion message. Changes are minimal and well-scoped.


What's Good

  • Gating logic is sound. The while committed.load(Acquire) == 0 { yield_now().await; } loop guarantees at least one append has committed before dedup starts, eliminating the scheduling-artifact false negative. This is the correct fix for the described CI flakiness.
  • committed is AtomicU64 not AtomicBool. Storing the count rather than a bare flag gives richer failure diagnostics if something goes wrong, even if only the > 0 condition is checked here.
  • Ordering choices are correct. Release on committed.store(i, ...) after each insert ensures the database write is visible to any thread that observes committed > 0 via Acquire load — exactly the happens-before guarantee needed. Similarly Release/Acquire on the stop flag is the canonical stop-flag pattern.
  • Assertion message. The added "post-dedup: dup_id row should be collapsed to 1" message is a clear improvement over a bare assert_eq!.

Suggestions / Issues

1. Spin-gate has no timeout — liveness risk if appender panics

If the spawned appender task panics before committing its first row (e.g. a transient insert error, resource exhaustion), committed never leaves zero and the main task spins forever in while committed.load(Acquire) == 0. The appender propagates its panic through appender.await? — but that line is never reached because the spin-wait above it never exits.

Consider wrapping with a timeout:

tokio::time::timeout(
    std::time::Duration::from_secs(30),
    async { while committed.load(Acquire) == 0 { tokio::task::yield_now().await; } }
).await.expect("appender did not commit within 30s");

Unlikely to trigger in practice, but a hung CI job is much harder to diagnose than a failed assertion.

2. stop.load(Acquire) inside the appender loop is over-ordered (harmless)

Acquire on the stop-flag read does not need to synchronize anything before the load — only the Release store needs to make preceding writes visible. Relaxed would technically suffice for the load. That said, Acquire is the conventional paired order and the cost is negligible since inserts dominate loop time. No action needed; noting it as a deliberate style choice.

3. use import scoped inside the function (minor)

async fn dedup_commits_despite_concurrent_appends() -> Result<()> {
    use std::sync::atomic::Ordering::{Acquire, Release};

Valid Rust; fine for a single-test scope. If other test functions start using atomics, moving it to the top of the file would reduce repetition.

4. Long destructure line (minor)

The captured-variable destructure is ~120 chars and wraps in narrow diffs. Low priority.


Verdict

The core fix is correct and the motivation is well-documented in both the PR description and the inline comment. The one item worth addressing before merge is the liveness risk (#1) — a 30-second timeout costs nothing and turns a potential infinite hang into a clear failure. Everything else is minor polish.

Intentional non-fix noted: The per-Database lock granularity trade-off is accepted as a design decision. That's reasonable given the commit-vs-flush cadence difference.

Generated with Claude Code

tonyalaribe added a commit that referenced this pull request Jun 12, 2026
Addresses claude review on #70: if the appender panics before its first
commit, the spin-wait on committed==0 would hang forever and never reach
appender.await?. Wrap it in tokio::time::timeout so the panic surfaces as
a loud failure instead of a hung CI job.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant