Skip to content

[Storage] Add contiguous journal and QMDB metrics#3721

Merged
roberto-bayardo merged 5 commits into
mainfrom
danlaine/metrics
May 8, 2026
Merged

[Storage] Add contiguous journal and QMDB metrics#3721
roberto-bayardo merged 5 commits into
mainfrom
danlaine/metrics

Conversation

@danlaine
Copy link
Copy Markdown
Collaborator

@danlaine danlaine commented May 5, 2026

Summary

Adds runtime metrics for storage components and a few small adjacent fixes:

  • Contiguous journals expose state, read, write, sync, cache, and commit metrics.
  • QMDB variants expose operation-log state, read, write, sync, prune, and Current-layer metrics.
  • Counters and duration histograms record calls once an operation starts. Explicit no-op batch reads such as empty read_many/get_many return before recording. Counters that report items processed (items_read, operations_applied) only advance on success.
  • Variable contiguous journals gain an explicit Reader::read_many override that batches the offsets lookup instead of relying on the trait default loop.
  • Reader::read_many and public keyless location batch reads now document the strictly-increasing input contract required by the optimized journal implementations.
  • keyless::Keyless::init_from_journal now takes the runtime context as an additional parameter so it can register metrics; the two keyless callers (fixed.rs, variable.rs) pass the context that already wraps the journal label.

No wire or storage formats change.

Metrics Added

Names below are the suffixes registered under each component's runtime label. For example, a journal labeled fixed_metrics exports fixed_metrics_size. Counter names get an automatic _total suffix when emitted; histogram names get _count/_sum/_bucket.

Contiguous Journal Metrics

Shared by fixed-size and variable-size contiguous journals:

  • size: logical end position.
  • pruning_boundary: oldest readable item position.
  • retained: number of readable items retained.
  • tail_items: items in the section containing the newest retained item.
  • append_calls_total: single-item append calls.
  • append_many_calls_total: append-many calls.
  • read_calls_total: single-item async read calls.
  • read_many_calls_total: non-empty batch read calls.
  • try_read_sync_hits_total: synchronous probes (try_read_sync) that returned Some.
  • items_read_total: items returned by successful read paths (read, read_many, try_read_sync).
  • sync_calls_total: full-sync calls.
  • append_duration: duration of single-item append calls.
  • append_many_duration: duration of append-many calls.
  • read_duration: duration of single-item read calls.
  • read_many_duration: duration of non-empty batch read calls.
  • sync_duration: duration of full-sync calls.

Fixed-size contiguous journals also add:

  • cache_hits_total: fixed items read synchronously without async storage fallback. This includes items satisfied during read_many plus successful try_read_sync probes. Single-item async read is not counted.
  • cache_misses_total: fixed items not satisfied synchronously. This includes read_many misses that require async storage fallback, plus try_read_sync probes that returned None, including pruned or out-of-range probes. Single-item async read is not counted.

Variable-size contiguous journals also add:

  • commit_calls_total: commit calls (durable persist that does not fully sync all indexes).
  • commit_duration: duration of commit calls.

Variable-size journals are backed by a data journal and an internal fixed-size offsets journal, which is registered under an offsets child label. All *_offsets_* metrics on a variable journal come from the internal fixed-size offsets journal. They include internal offset operations performed by variable-journal read, write, sync, prune, replay, rewind, and recovery paths. They are not the user-facing call counts at the variable layer.

QMDB Operation-Log Metrics

Used by any, immutable, and keyless QMDB layers where applicable:

  • size: logical operation end.
  • pruning_boundary: oldest retained operation location.
  • retained: number of retained operations.
  • inactivity_floor: application-declared pruning floor location.
  • last_commit: most recent commit operation location.
  • apply_batch_calls_total: apply-batch calls.
  • operations_applied_total: operations written by successful batch applications.
  • commit_calls_total: durable commit calls.
  • sync_calls_total: full-sync calls.
  • prune_calls_total: prune calls.
  • apply_batch_duration: duration of apply-batch calls.
  • commit_duration: duration of commit calls.
  • sync_duration: duration of sync calls.
  • prune_duration: duration of prune calls.

Key-based QMDB reads (any, immutable) add:

  • get_calls_total: single-key get calls.
  • get_many_calls_total: non-empty get-many calls.
  • keys_requested_total: keys requested by attempted reads, whether or not found.
  • get_duration: duration of get calls.
  • get_many_duration: duration of non-empty get-many calls.

Location-based QMDB reads (keyless) add:

  • get_calls_total: single-location get calls.
  • get_many_calls_total: non-empty get-many calls. Input locations must be strictly increasing.
  • locations_requested_total: locations requested by attempted reads, whether or not found.
  • get_duration: duration of get calls.
  • get_many_duration: duration of non-empty get-many calls.

Current QMDB Metrics

These count Current-layer calls only. The underlying any counters (any_apply_batch_calls_total, any_commit_calls_total, etc.) are bumped when Current delegates to the wrapped any::Db. Current's prune uses the internal any::Db::prune_log helper, so any_prune_calls_total will stay at 0 in Current-only deployments even as current_prune_calls_total increases.

  • pruned_chunks: number of pruned bitmap chunks.
  • sync_boundary: most recent safe sync boundary location.
  • apply_batch_calls_total: Current-layer apply-batch calls.
  • sync_calls_total: Current-layer sync calls.
  • prune_calls_total: Current-layer prune calls.
  • apply_batch_duration: duration of Current-layer apply-batch calls.
  • sync_duration: duration of Current-layer sync calls.
  • prune_duration: duration of Current-layer prune calls.

QMDB Sync Progress Metrics

The sync engine now registers progress gauges under a sync child label to avoid collisions with journal metrics:

  • sync_journal_size: current sync journal size.
  • sync_target_end: exclusive target range end, equal to journal size when sync completes.

@danlaine danlaine self-assigned this May 5, 2026
@danlaine danlaine added this to Tracker May 5, 2026
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 5, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
commonware-mcp 1f0feae May 08 2026, 01:49 PM

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Benchmark results

Tip

PASSED: No benchmark exceeded the regression threshold.

Benchmark comparison table
Benchmark Baseline (main) Current Delta Threshold Status
qmdb::merkleize/variant=any::unordered::fixed::mmr keys=10000 ch=false sync=false 1.670 ms 1.562 ms -6.46% 10.00% ✅ PASS
qmdb::merkleize/variant=current::ordered::fixed::mmb chunk=256 keys=10000 ch=true sync=false 2.921 ms 2.842 ms -2.70% 10.00% ✅ PASS

Baseline commit(s): b155ef4dd5ee

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 5, 2026

Deploying monorepo with  Cloudflare Pages  Cloudflare Pages

Latest commit: 1f0feae
Status: ✅  Deploy successful!
Preview URL: https://a77156fa.monorepo-eu0.pages.dev
Branch Preview URL: https://danlaine-metrics.monorepo-eu0.pages.dev

View logs

@danlaine danlaine moved this to In Progress in Tracker May 6, 2026
@danlaine danlaine changed the title [Storage] Add contiguous journal metrics [Storage] Add contiguous journal and QMDB metrics May 6, 2026
@danlaine danlaine force-pushed the danlaine/metrics branch from ab1f39f to ad7fb75 Compare May 7, 2026 15:34
@danlaine danlaine moved this from In Progress to Ready for Review in Tracker May 7, 2026
@danlaine danlaine force-pushed the danlaine/metrics branch from 5c484b4 to 8bbd312 Compare May 7, 2026 17:33
@danlaine danlaine marked this pull request as ready for review May 7, 2026 17:34
Comment thread storage/src/metrics.rs Outdated
@danlaine danlaine requested a review from roberto-bayardo May 7, 2026 22:24
@roberto-bayardo
Copy link
Copy Markdown
Collaborator

lgtm except for the stack overflow test issues.

@danlaine
Copy link
Copy Markdown
Collaborator Author

danlaine commented May 8, 2026

lgtm except for the stack overflow test issues.

Should (hopefully!) be fixed now 🤞

@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 99.44812% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.83%. Comparing base (b155ef4) to head (1f0feae).

Files with missing lines Patch % Lines
runtime/src/telemetry/metrics/histogram.rs 90.24% 3 Missing and 1 partial ⚠️
storage/src/qmdb/keyless/batch.rs 50.00% 1 Missing ⚠️
@@            Coverage Diff             @@
##             main    #3721      +/-   ##
==========================================
+ Coverage   95.81%   95.83%   +0.01%     
==========================================
  Files         466      468       +2     
  Lines      185899   186717     +818     
  Branches     4443     4445       +2     
==========================================
+ Hits       178119   178931     +812     
- Misses       6356     6360       +4     
- Partials     1424     1426       +2     
Files with missing lines Coverage Δ
storage/src/journal/contiguous/fixed.rs 97.09% <100.00%> (+0.15%) ⬆️
storage/src/journal/contiguous/metrics.rs 100.00% <100.00%> (ø)
storage/src/journal/contiguous/mod.rs 48.64% <ø> (ø)
storage/src/journal/contiguous/variable.rs 98.83% <100.00%> (+0.06%) ⬆️
storage/src/journal/segmented/fixed.rs 97.53% <ø> (ø)
storage/src/qmdb/any/batch.rs 97.04% <100.00%> (+0.01%) ⬆️
storage/src/qmdb/any/db.rs 93.27% <100.00%> (+0.76%) ⬆️
storage/src/qmdb/any/mod.rs 99.27% <100.00%> (-0.02%) ⬇️
storage/src/qmdb/any/sync/mod.rs 97.39% <100.00%> (+0.02%) ⬆️
storage/src/qmdb/any/unordered/fixed.rs 97.84% <100.00%> (+0.22%) ⬆️
... and 16 more

... and 5 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b155ef4...1f0feae. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@roberto-bayardo roberto-bayardo enabled auto-merge May 8, 2026 14:45
@roberto-bayardo roberto-bayardo added this pull request to the merge queue May 8, 2026
Merged via the queue into main with commit 5a63d1f May 8, 2026
182 checks passed
@roberto-bayardo roberto-bayardo deleted the danlaine/metrics branch May 8, 2026 16:04
@github-project-automation github-project-automation Bot moved this from Ready for Review to Done in Tracker May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants