Skip to content

[runtime] Allow per-partition (or per-init) storage root override on the Storage trait #3734

@christopherwxyz

Description

@christopherwxyz

Problem

commonware-runtime's tokio::Config exposes a single, immutable-after-construction storage_directory, and every Storage backend resolves all paths underneath it. Concretely:

  • runtime/src/tokio/runtime.rsConfig { storage_directory: PathBuf, .. } with with_storage_directory(p) and storage_directory() accessors. The field is consumed when Storage is started (runtime.rs:386-392 passes self.cfg.storage_directory.clone() into IoUringConfig). After start(), the root is fixed for the lifetime of the runtime.
  • runtime/src/storage/tokio/mod.rsConfig { storage_directory: PathBuf, maximum_buffer_size: usize }. Every open_versioned / scan / remove resolves through self.cfg.storage_directory.join(partition).join(hex(name)) (mirrored in runtime/src/storage/iouring.rs at :115 for open_versioned and :185/:205 for remove/scan).
  • runtime/src/storage/mod.rs:181-194validate_partition_name rejects any character that isn't alphanumeric, -, or _. So a subdirectory hop (/) cannot be smuggled inside a partition name to redirect a particular partition to a different root.
  • Runner::start builds and owns a tokio runtime. Calling Runner::new(cfg).start(..) twice in one process — once per intended root — is invalid inside an already-running runtime.

Net result: every Storage-backed partition the application opens (block journals, archives, marshal metadata, consensus journals, and DKG share material) MUST live under one filesystem root. There is no API to peel off a single high-trust partition onto a separate volume.

Use case

Production consensus operators have different durability classes for different state:

  • Frequently rotated / rebuildable data (block journals, archives, sync state) on a large, recyclable volume that can be wiped and restored from peers.
  • Rare-write, irreplaceable secrets (BLS shares, DKG output, signing keys) on a small, snapshot-backed volume with a different ops policy.

Today these must share one filesystem root. A maintenance operation on the recyclable volume — a Kubernetes kubectl delete pvc, a snapshot-roll, an ops-driven re-init — wipes the secret material as collateral damage. In a BLS threshold cluster with n validators and threshold t = 2f+1, two such collateral-damage events within f of each other crosses the BFT bound and wedges consensus until a fresh DKG ceremony is run.

The only mitigations available today are operationally fragile:

  1. Mount one PVC at the root and use subPath overlays to project a second PVC underneath it. The lifecycle of the subPath mount is tied to the parent mount, and a kubectl delete pvc on the parent silently wipes everything not actually backed by the second PVC unless the operator is very careful.
  2. Run secret-bearing storage out-of-process (separate node, separate runtime), losing all the in-process sharing the rest of the stack provides.
  3. Fork commonware-runtime's Storage impl.

Proposal

Three API shapes for maintainers to weigh in on. Any would work; preference is A.

Shape A — with_storage_root(path) on the runtime context

Add a per-context override that propagates into the next storage backend it constructs:

let context = Runner::new(cfg).start(|context| async move {
    // Default root from Config — used by everything not overridden.
    let qmdb = open_qmdb(context.with_label("qmdb")).await?;
    let journal = open_journal(context.with_label("journal")).await?;

    // Override for DKG share material onto a separate PVC.
    let dkg_ctx = context
        .with_storage_root("/var/lib/dkg-protected")
        .with_label("dkg");
    let dkg = open_dkg(dkg_ctx).await?;
});
  • Pros: mirrors existing with_label; minimal caller change; composable per call site.
  • Cons: requires TokioStorage to support multiple live roots; needs a clear story for what happens if two contexts with different roots open partitions of the same name (suggested: roots are independent, partition-name uniqueness is per-root).

Shape B — explicit per-partition root mapping in Config

let cfg = tokio::Config::new()
    .with_storage_directory("/var/lib/state")
    .with_partition_root_override("dkg_states", "/var/lib/dkg-protected/dkg_states")
    .with_partition_root_override("dkg_msgs",   "/var/lib/dkg-protected/dkg_msgs");
  • Pros: fully explicit; auditable in config; no new context machinery.
  • Cons: caller must enumerate every partition name; brittle when upstream renames or adds prefixes; doesn't compose with libraries that own their partition names internally.

Shape C — separate Storage impls under one runtime

Allow constructing a MeteredStorage<TokioStorage> (or equivalent) directly from a path, decoupled from Config::storage_directory, and pass it into individual subsystems via a per-init context wrapper.

  • Pros: maximally flexible; cleanest separation of concerns; opens the door to mixed backends (e.g. tokio fs for one root, iouring for another).
  • Cons: more invasive; touches the runtime/storage boundary.

Recommended sketch (Shape A)

// runtime/src/tokio/context.rs (illustrative)
impl Context {
    /// Override the storage root for any Storage handle subsequently obtained
    /// from this context. Inherits from `Config::storage_directory` if unset.
    pub fn with_storage_root(mut self, root: impl Into<PathBuf>) -> Self {
        self.storage_root_override = Some(root.into());
        self
    }
}

// runtime/src/storage/tokio/mod.rs
impl Storage {
    fn root_for(&self, ctx: &Context) -> &Path {
        ctx.storage_root_override
            .as_deref()
            .unwrap_or(&self.cfg.storage_directory)
    }
}

Path resolution inside open_versioned/scan/remove becomes self.root_for(ctx).join(partition).join(hex(name)) instead of self.cfg.storage_directory.join(...). validate_partition_name is unchanged — the override path is provided by the operator/application, not embedded in a partition name.

Backwards compatibility

  • Default behavior unchanged: with no with_storage_root call (Shape A) or no with_partition_root_override entries (Shape B), every partition resolves under Config::storage_directory exactly as today.
  • validate_partition_name keeps its current rules.
  • New methods are additive; existing call sites compile and behave identically.
  • No on-disk format change — partitions written under an overridden root are byte-identical to partitions written under the default root, just at a different absolute path.

Open questions

  1. Is per-context (A) or per-config (B) the better fit for the runtime's existing extension idioms? with_label precedent leans A; explicit-config precedent leans B.
  2. Should an overridden root be required to exist + be writable at override time, or lazily on first partition open?
  3. Should there be a storage_root_aliases list that surfaces in metrics / health endpoints so operators can see which roots are live?
  4. Is there appetite for Shape C as a longer-term direction even if A/B lands first?

Happy to send a PR for whichever shape is preferred.

Files cited (v2026.4.0)

  • runtime/src/tokio/runtime.rsConfig definition and Runner::start plumbing
  • runtime/src/storage/tokio/mod.rsConfig { storage_directory, maximum_buffer_size } + path resolution
  • runtime/src/storage/iouring.rs:115,185,205 — same path resolution mirrored for the iouring backend
  • runtime/src/storage/mod.rs:181-194validate_partition_name (rejects /)

Prior art

Searched commonwarexyz/monorepo issues and PRs for storage_directory, partition root, multiple storage, secure storage, PVC. No existing thread proposes per-partition / per-init storage root override. Closest related work, none of which addresses this:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions