Skip to content

Multilog consistent read optimizations#1832

Draft
vazois wants to merge 5 commits into
mainfrom
vazois/mlog-opt
Draft

Multilog consistent read optimizations#1832
vazois wants to merge 5 commits into
mainfrom
vazois/mlog-opt

Conversation

@vazois
Copy link
Copy Markdown
Contributor

@vazois vazois commented May 27, 2026

Summary

Investigate and implement optimizations for the multilog consistent read path.

Optimizations

  • Add BDN benchmark to validate consistent read overhead
  • (A) Cache consistent read session switch (avoid redundant SwitchActiveDatabaseSession)
  • (B) Use bitshift/mask instead of division/mod for sublog idx calculation — not worth it; marginal gain with power-of-two restriction on sublog counts
  • (C) Use key hash bits after sublog idx shift for sketch indexing (collision fix)
  • (D) Drop unnecessary inProgress lock and ResetTimeoutCts contention

Pre-optimization baseline

Method Runtime Params Mean Error StdDev
Get .NET 10.0 MultiLog+Primary 23.96 us 0.469 us 0.416 us
MGet .NET 10.0 MultiLog+Primary 16.79 us 0.106 us 0.083 us
Get .NET 8.0 MultiLog+Primary 24.83 us 0.051 us 0.045 us
MGet .NET 8.0 MultiLog+Primary 16.86 us 0.117 us 0.110 us
Get .NET 10.0 MultiLog+Replica 39.09 us 0.113 us 0.095 us
MGet .NET 10.0 MultiLog+Replica 32.88 us 0.214 us 0.190 us
Get .NET 8.0 MultiLog+Replica 41.55 us 0.176 us 0.156 us
MGet .NET 8.0 MultiLog+Replica 33.62 us 0.140 us 0.124 us
Get .NET 10.0 SingleLog 24.07 us 0.068 us 0.060 us
MGet .NET 10.0 SingleLog 16.36 us 0.104 us 0.098 us
Get .NET 8.0 SingleLog 25.30 us 0.173 us 0.145 us
MGet .NET 8.0 SingleLog 16.80 us 0.092 us 0.082 us

Post-optimization results

Method Runtime Params Mean Error StdDev A C
Get .NET 10.0 MultiLog+Primary 24.34 us 0.387 us 0.362 us
MGet .NET 10.0 MultiLog+Primary 18.32 us 0.324 us 0.303 us
Get .NET 8.0 MultiLog+Primary 26.25 us 0.513 us 0.570 us
MGet .NET 8.0 MultiLog+Primary 19.36 us 0.552 us 1.610 us
Get .NET 10.0 MultiLog+Replica 40.67 us 0.789 us 0.700 us -2.5% ~0%
MGet .NET 10.0 MultiLog+Replica 33.29 us 0.345 us 0.306 us -1.7% ~0%
Get .NET 8.0 MultiLog+Replica 42.54 us 0.554 us 0.519 us -1.0% ~0%
MGet .NET 8.0 MultiLog+Replica 34.25 us 0.476 us 0.422 us -1.8% ~0%
Get .NET 10.0 SingleLog 24.86 us 0.465 us 0.435 us
MGet .NET 10.0 SingleLog 16.82 us 0.220 us 0.195 us
Get .NET 8.0 SingleLog 25.15 us 0.260 us 0.230 us
MGet .NET 8.0 SingleLog 17.24 us 0.218 us 0.204 us

Notes:

  • (A) Negligible improvement (~1-2%) — session switch is just 7 field assignments.
  • (C) No throughput change in single-threaded BDN (same computational cost). Its value is correctness: eliminates false sketch collisions that cause unnecessary blocking under real multi-threaded replay workloads.
  • Dominant overhead is in the per-key lock + CTS reset (optimization D).

Follow-up items (optional)

  • Defer sketch-max update to reduce cache invalidation under concurrent load — When writers and replay threads are active alongside readers, updating the per-key sequence number sketch immediately on each replay invalidates cache lines shared with reader threads. Deferring or batching the sketch-max update avoids false sharing. Not a factor in the current BDN (no active writer), but a best-practice for production workloads.

  • Timestamp inversion with concurrent writers to the same sublog — Two write sessions appending to the same physical sublog can acquire AOF timestamps out of order (T1 < T2 but T2's record lands first). At replay, the sketch slot sees T2 before T1 and never updates for T1, so a reader may get a false "caught up" signal before T1's mutation is applied. Correctness gap to address.

  • Replay-thread pacing / coordination barrier — Without coordination, replay threads for different sublogs can diverge arbitrarily. A lightweight barrier or watermark sync would keep them roughly aligned, reducing worst-case reader wait time and bounding the staleness window without adding per-key cost on the read path.

  • Consolidate witness-tail into the replication stream (time-advancement sentinel) — Eliminate the separate witness-tail task by emitting a lightweight sentinel on the same connection that ships the log. The sentinel signals "time advanced to sequence N" for a sublog without appending a real AOF record. Key validations: (1) blind sequence-number acquisition is safe under monotonic-max semantics, (2) sentinel does not allocate AOF space, (3) no races with concurrent write sessions acquiring sequence numbers in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant