Skip to content

feichai0017/NoKV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2,044 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
NoKV

Metadata control plane for object-backed agent artifacts.

CI crates.io docs.rs Rust Version

License CNCF Landscape DBDB.io DeepWiki

Docs Β· Why Filesystems Β· Quick Start Β· Benchmarks Β· Discussions

Listed In The AI-Native Storage Ecosystem

Cloud Native Computing Foundation
Linux Foundation CNCF Landscape
Listed in AI Native Infra / Storage and Cloud Native Storage.
DBDB.io Database of Databases
DBDB.io Database of Databases
Historical database profile; current NoKV is the Rust filesystem product line.

What Is NoKV?

NoKV is a metadata control plane for object-backed agent artifacts: run outputs, log files, checkpoints, reports, and citable evidence in one filesystem-shaped namespace. For the longer interface argument, see Agents Want Filesystems.

It is not a trace database. Keep JSONL, SQLite, or Postgres as the source of truth for runtime events; use NoKV as the agent-facing namespace over the artifacts and evidence those systems produce.

NoKV keeps namespace metadata in its own path-native engine (Holt) and stores file bodies as immutable blocks in S3-compatible object storage such as RustFS, MinIO, Ceph RGW, or AWS S3.

FUSE / SDK / CLI
  -> NoKV metadata service     (self-contained; no separate metadata DB to run)
  -> Holt inode/dentry metadata
  -> S3-compatible object store for file bodies

NoKV owns namespace truth, metadata transactions, snapshots, watches, and object-reference GC. The object store owns byte durability and replication. The metadata engine is built in, so local deployments operate a filesystem rather than a filesystem plus a separate Redis, MySQL, or TiKV cluster.

Why NoKV

Agent workflows are artifact-heavy; their workspaces aren't. Every run leaves behind configs, metrics, logs, checkpoints β€” and that state scatters across folders, JSON files, object-store keys, and database rows. Agents pay a navigation tax in tokens every time they go looking. NoKV gives that state one address, with the metadata guarantees the workload actually needs:

  • Checkpoints publish atomically. Readers see the complete new checkpoint or the previous one β€” never a half-written file, even across a crash.
  • Snapshots are time travel. Pin a frozen view of any subtree and keep reading it while jobs write; GC never deletes what a snapshot still needs.
  • Changes are events, not polls. Every create, rename, and publish lands as a typed, replayable event with a cursor.
  • Artifacts carry body references and digests, with cleanup of failed staged uploads.
  • Bodies are immutable, versioned blocks. Replacement publishes a new generation, so node-local caches never invalidate object bytes after publish.

The primary write model is write-once publish, matching how datasets, checkpoints, and artifacts are commonly written.

πŸ€– The Agent Interface

ls Β· stat Β· catalog Β· find Β· aggregate Β· read Β· grep

Seven verbs, one progressive-disclosure surface: an agent discovers what exists, learns what is queryable, and pays to read only what it needs. Predicates, sort, and projection are pushed into the engine, so a "top-5 runs by val_loss" report costs two calls β€” one catalog, one find. grep sweeps a subtree and returns line-numbered matches with citable evidence URIs (nokv-native://path@generation:N#L3).

The verbs are defined in nokv-client: the tool definitions are LLM-ready JSON schemas, and execute_agent_tool routes calls over the same AgentNamespace trait whether the namespace is remote (metadata RPC) or embedded.

Today the agent verbs ship in the Rust SDK; filesystem operations ship in the nokv CLI and FUSE mount. An MCP server is in development β€” follow #354.

πŸ“Š Measured Evidence

Agent interface. We gave the same agent (gpt-5.4-mini) the same 875-run experiment corpus through two surfaces β€” raw SQL over SQLite, and the NoKV namespace β€” across five tasks, 10 repeats per arm and task (100 fully stateless runs), judged against deterministic gold facts neither arm can see:

Set mean (per 5-task pass) Raw SQLite NoKV namespace
Tasks solved correctly 4.40 / 5 4.50 / 5
Prompt tokens (incl. cached) 151,572 82,827 (βˆ’45%)
Cost (USD, list rates) $0.0708 $0.0433 (βˆ’39%)

In this 10-repeat sample, the token gap widens to ~2.4Γ— on the compound-exploration subset, and SQL won the single-shot analytics task β€” per-task results, wins and losses both, are in the report. Harness, tasks, judge, and the raw telemetry of all 100 runs are committed, so every published number is recomputable: see bench/agent-interface/.

Storage engine. Local engineering baselines, not official MLPerf results. Single-node service numbers are release builds through the NoKV server and Holt metadata path. FUSE comparison numbers depend on kernel/FUSE, object backend, cache settings, and workload shape.

Workload Result
Metadata create (mdtest, 65k records) ~127K ops/s (single-writer, batched service path)
Same, one directory of 65k entries Same order of throughput; path-native ART does not degrade on large directories
Checkpoint publish (1 MiB blocks, concurrency 16) ~1.1 GiB/s in the service/object benchmark
Dataset read (16 KiB samples, concurrency 16) ~3,000 samples/s in the service/object benchmark
Resident metadata ~1.5 KiB / file in the measured shape
Atomic checkpoint Object bytes land first; metadata publishes a new generation atomically

Same-machine FUSE-vs-FUSE smoke against one RustFS endpoint currently shows NoKV behind JuiceFS on the end-to-end mounted path. That gap is expected to come from FUSE/RPC fixed costs and data-plane cache/writeback maturity, not from the Holt metadata engine alone.

NoKV vs JuiceFS

NoKV follows the same high-level separation used by systems like JuiceFS and 3FS: metadata is separate from file body storage. The difference is that NoKV ships its metadata engine as part of the filesystem and optimizes for agent-workspace and artifact publish/read patterns first.

JuiceFS NoKV
Metadata engine External DB such as Redis, MySQL, or TiKV Built-in, path-native Holt engine
Atomic checkpoint publish POSIX rename/write semantics over the metadata engine First-class publish-by-generation primitive
Block model Slice/block model supporting broad POSIX behavior Immutable object blocks plus new-generation manifests
Workspace-native primitives Layered on top of the filesystem Snapshots, typed watch, body descriptors, and GC floors are core metadata concepts
Agent query surface None ls/stat/catalog/find/aggregate/read/grep with push-down and line-numbered evidence
POSIX completeness Mature production filesystem P0 subset implemented; still hardening compatibility gates
Maturity Production, large deployments Young Rust implementation; single-node local mode is usable, replication is roadmap

NoKV is an object-backed filesystem with a sharded Holt metadata plane: multiple metadata shards (one Holt engine each) behind long-running metadata servers, routed through an etcd control plane, with cross-shard grafts presenting a single FUSE namespace across shards. Metadata HA today is single-writer-per-shard with checkpoint-image + shared-log, epoch-fenced failover β€” not yet consensus replication β€” so it is not yet a JuiceFS/3FS class production-HA distributed filesystem.

πŸ—οΈ Architecture

crates/
  nokv-types     storage-neutral namespace model types
  nokv-protocol  framed metadata RPC DTOs and binary codec
  nokv-meta      schema, MetadataCommand, Holt store, service core
  nokv-control   shard ownership, epochs, and failover coordination
  nokv-object    S3-compatible object body storage helpers
  nokv-client    Rust SDK over metadata service and object backend
  nokv-fuse      low-level FUSE frontend
  nokv-server    long-running metad process and framed RPC service
  nokv           CLI binary

bench/             system workload benchmark harness
docs/              product, architecture, layout, RustFS, and benchmark docs

For artifact and checkpoint publish, object bytes are uploaded first, then the metadata commit publishes the dentry, inode projection, and body manifest atomically. A crash between the two leaves orphan objects for GC, never a corrupt namespace. See Architecture.

🚦 Quick Start

Build and test:

cargo test --workspace
cargo build --release -p nokv --bin nokv

Start a local RustFS-compatible S3 endpoint and create the default bucket:

mkdir -p /tmp/rustfs-data
RUSTFS_ACCESS_KEY=rustfsadmin \
RUSTFS_SECRET_KEY=rustfsadmin \
rustfs server --address 127.0.0.1:9000 /tmp/rustfs-data &

AWS_ACCESS_KEY_ID=rustfsadmin \
AWS_SECRET_ACCESS_KEY=rustfsadmin \
aws --endpoint-url http://127.0.0.1:9000 \
  s3api create-bucket --bucket nokv

By default NoKV expects bucket nokv at http://127.0.0.1:9000 with development credentials rustfsadmin / rustfsadmin. See docs/rustfs.md for other deployment modes.

Start the metadata server, then initialize the namespace. Every other command talks to the server on 127.0.0.1:7777, so keep it running:

cargo run --release -p nokv --bin nokv -- serve &

cargo run --release -p nokv --bin nokv -- init

Publish and read an artifact:

cargo run --release -p nokv --bin nokv -- \
  put-artifact /runs/1/checkpoint.bin ./checkpoint.bin

cargo run --release -p nokv --bin nokv -- \
  cat /runs/1/checkpoint.bin > restored.bin

Mount with FUSE:

mkdir -p /tmp/nokv-mount

cargo run --release -p nokv --bin nokv -- \
  mount /tmp/nokv-mount

On macOS this requires macFUSE. NoKV passes the noappledouble mount option to avoid Finder/resource-fork AppleDouble sidecars; user xattr roundtrip is covered by the FUSE smoke test.

🧩 Crates

Crate Role
nokv-types Storage-neutral namespace model
nokv-protocol Framed metadata RPC DTOs and binary codec
nokv-object S3-compatible object body storage
nokv-meta Schema, MetadataCommand, Holt store, service core
nokv-control Shard ownership, epochs, and failover coordination
nokv-client Rust SDK over the metadata service
nokv-fuse Low-level FUSE frontend
nokv-server Long-running metad process and framed RPC
nokv nokv CLI binary

βœ… Current Status

Implemented today:

  • low-level FUSE frontend for lookup, getattr, readdir, readdirplus, create, mkdir, symlink/readlink, rename, unlink, rmdir, read, write, flush, release, fsync, setattr/truncate, hardlink, xattr, advisory locks, special files, statfs, lseek, fallocate, and copy_file_range;
  • Holt-backed local metadata service with inode/dentry canonical metadata, dentry projection, command predicates, command dedupe, and history records;
  • chunked object data path where file bodies are split into immutable object blocks and published by metadata manifest;
  • S3-compatible object backend, with RustFS as the local development default;
  • Rust SDK and nokv CLI for namespace operations, artifact publish, metadata server access, and object range reads;
  • the seven-verb agent query surface (ls/stat/catalog/find/ aggregate/read/grep) in the Rust SDK, with LLM-ready tool definitions;
  • long-running nokv-server with health, readiness, stats, manual GC, and framed binary metadata RPC;
  • nokv-control shard ownership store (in-memory plus optional etcd-backed session leases behind the etcd feature) and a server shard-owner guard that installs and renews lease epochs into the metadata commit fence;
  • multi-shard distributed metadata: subtree/path-prefix sharding (one Holt engine per shard), high-bit shard-tagged global inodes, etcd control-plane routing with client re-resolve on owner handoff, and cross-shard grafts that present a single FUSE namespace across shards;
  • logical metadata log segment codec/archive/replay foundation, plus controlled server sync shared-log ACK mode that publishes LogRef before successful RPC ACKs, including grouped independent-batch log segments;
  • controlled metadata failover smoke that restores a checkpoint, replays the shared log, starts the bumped-epoch owner, and accepts a new metadata write;
  • local multi-process metadata HA + multi-shard fleet smoke scripts that exercise etcd ownership, RustFS-backed checkpoint/log archive, owner death, epoch failover, post-failover replay, and a SIGSTOP/SIGCONT stale-owner fence mode;
  • a Python SDK (PyO3) and fsspec filesystem with reads, writes, namespace ops, snapshots, atomic checkpoint publish/resolve, and a torch DataLoader + DCP backend;
  • read-only snapshot mounts, snapshot-version reads, typed watch replay, and FUSE cache invalidation from watch events;
  • pending-object GC and metadata history GC tied to snapshot retention.

Not implemented yet:

  • consensus-replicated metadata (Raft/Paxos) β€” HA today is single-writer-per-shard with checkpoint + shared-log failover, not replicated;
  • intra-subtree sharding (a single hot subtree is capped at one shard), learner read scaling, and chaos-tested failover timing;
  • an MCP server for the agent verbs β€” in development, tracked in #354;
  • Kubernetes CSI packages;
  • full POSIX hardening such as ACL enforcement, broad external compatibility gate coverage, and mature multi-client cache coherence.

Benchmarks

The root bench/ package contains all benchmark entry points. System workload runs use nokv-bench:

cargo run --release -p nokv-bench --bin nokv-bench -- \
  --profile smoke \
  --workload all

Key workloads:

  • mdtest-easy and mdtest-hard metadata smoke workloads;
  • metadata-negative-lookup, artifact-index-lookup, and metadata-concurrent-read Holt metadata read-path workloads;
  • metadata-durability-batch batch metadata create workload with comparable local-only and sync-shared-log ACK phases;
  • checkpoint-publish object-backed checkpoint publish/read;
  • training-read dataset-shaped object reads;
  • mlperf-dlio generated MLPerf Storage/DLIO-style training and checkpoint shape;
  • metadata HA smoke through scripts/run-metadata-ha-smoke.sh for owner leases, epoch fencing, checkpoint restore, shared-log replay, failover RTO timing, and stale-owner write rejection.

All workloads are single-node service runs; see docs/benchmarks.md for the full workload list, profiles, and gates.

The agent-interface benchmark β€” harness, tasks, judge, report, and the raw telemetry behind the numbers above β€” lives under bench/agent-interface/ and runs through the same package:

cargo run --release -p nokv-bench --bin yanex-agent-bench -- list-tasks

For the fast AI-training product gate, run:

scripts/run-ai-training-smoke.sh

The default gate covers Holt metadata read concurrency, checkpoint publish, and DLIO-style object reads/writes. Most benchmark workloads are still single-node service runs. Training-cluster claims need separate runs that report replication, cache, object-store, and durability settings.

Run scripts/run-ai-training-smoke.sh fuse-smoke when the local machine has a working FUSE installation and you want the mounted POSIX smoke in the same workflow.

For the local metadata HA gate, run:

scripts/run-metadata-ha-smoke.sh

It requires RustFS, AWS CLI, curl, and either a local etcd binary or NOKV_HA_ETCD_ENDPOINTS pointing at an external etcd cluster. Set NOKV_HA_METRICS_JSON=/tmp/nokv-ha.json to keep the emitted HA_SMOKE_METRICS JSON for CI or benchmark reports. Set NOKV_HA_STALE_OWNER_CHAOS=1 to run the local stale-owner fence mode; that mode uses NOKV_HA_OWNER_B_BIND for the replacement owner.

πŸ“š Documentation

πŸ“„ License

Apache-2.0. See LICENSE.

Packages

 
 
 

Contributors