OmniKV

The database engine that replaces 5 services with 1 binary.

A distributed, transactional SQL + KV database engine — written from scratch in Rust.
No RocksDB wrapper. No SQLite fork. Every byte is ours.

🤯 Why OmniKV?

Most companies run 5+ separate services for their data layer:

Service	What they deploy	What OmniKV gives you
Database	PostgreSQL / MySQL	✅ Full SQL engine with JOINs, aggregates, window functions
KV Store	Redis / etcd	✅ Sub-millisecond KV with TTL, range scans, MVCC
Consensus	etcd / ZooKeeper	✅ Built-in Raft consensus — 58 cluster tests
API Server	Express / Flask	✅ REST API + QUIC + TCP — built in
Auth + Metrics	Auth0 + Prometheus	✅ JWT auth + Prometheus `/metrics` — built in

OmniKV collapses all of this into a single cargo build binary.

⚡ 30-Second Demo

# Start OmniKV (4 protocols start automatically)
cargo run --release

# Connect with psql — yes, your regular PostgreSQL client
psql -h localhost -p 5433

CREATE TABLE users (id INT, name TEXT, email TEXT);
INSERT INTO users VALUES (1, 'Alice', 'alice@dev.io');
INSERT INTO users VALUES (2, 'Bob', 'bob@dev.io');

CREATE TABLE orders (id INT, user_id INT, amount FLOAT);
INSERT INTO orders VALUES (101, 1, 299.99);
INSERT INTO orders VALUES (102, 2, 149.50);

-- Cost-based optimizer picks Hash JOIN, smaller table as build side
EXPLAIN SELECT u.name, SUM(o.amount)
FROM users u INNER JOIN orders o ON u.id = o.user_id
GROUP BY u.name;

🏗️ Architecture — Everything is Custom

┌──────────────────────────────────────────────────────────────────┐
│                        CLIENT LAYER                              │
│  ┌──────────┐  ┌──────────┐  ┌───────────┐  ┌──────────────┐   │
│  │ PgWire   │  │ REST API │  │   QUIC    │  │ TCP Command  │   │
│  │ v3       │  │ HTTP/2   │  │  HTTP/3   │  │  Interface   │   │
│  │          │  │ + TLS    │  │  (Quinn)  │  │              │   │
│  └────┬─────┘  └────┬─────┘  └─────┬─────┘  └──────┬───────┘   │
├───────┴──────────────┴──────────────┴───────────────┴───────────┤
│                        SQL ENGINE                                │
│  ┌──────────────┐  ┌───────────────────┐  ┌────────────────┐   │
│  │ SQL Parser   │→ │ Cost-Based        │→ │ Volcano        │   │
│  │ (recursive   │  │ Optimizer         │  │ Iterator       │   │
│  │  descent)    │  │ (histograms,      │  │ Executor       │   │
│  │              │  │  predicate push,  │  │ (O(1) filter,  │   │
│  │              │  │  JOIN reorder,    │  │  hash join,    │   │
│  │              │  │  index select)    │  │  streaming)    │   │
│  └──────────────┘  └───────────────────┘  └────────────────┘   │
├──────────────────────────────────────────────────────────────────┤
│                   TRANSACTION ENGINE                             │
│   Serializable Snapshot Isolation (SSI) · Savepoints             │
│   Write-write conflict detection · 2PC distributed txn           │
│   Transaction timeouts · RW-dependency tracking                  │
├──────────────────────────────────────────────────────────────────┤
│                    STORAGE ENGINE                                │
│  WAL (CRC32) → 16-shard SkipMap Memtable → SSTable (sorted)    │
│  Heap Store (CRC32/entry) · Bloom Filters · Block Cache (LRU)  │
│  ArcSwap Topology (zero-stall swap) · LZ4 Compression           │
│  L0 → L1 → L2 Tiered Compaction · MVCC Snapshots               │
├──────────────────────────────────────────────────────────────────┤
│                    RAFT CONSENSUS                                │
│  OpenRaft 0.9.24 · Leader Election · Log Replication             │
│  Atomic Snapshot Install · Membership Changes · Rolling Upgrades │
│  Network Partitions · 2PC Cross-Shard Replication                │
└──────────────────────────────────────────────────────────────────┘

🔬 Deep Feature Analysis

Storage Engine — `lib.rs` (2,038 lines)

The storage engine is a full LSM-tree implementation, not a wrapper around RocksDB or LevelDB.

Component	How it works	Why it matters
3-Phase Pipelined Write	Phase 1: LZ4 compress + CRC32 (no lock). Phase 2: sequence + offset reservation (µs mutex). Phase 3: heap pwrite + WAL append + memtable insert.	Multiple concurrent batches overlap CPU-intensive compression. Only serializes for sequence numbering — microseconds.
16-Shard SkipMap Memtable	`crossbeam-skiplist::SkipMap` × 16 shards, FNV-hashed. Lock-free concurrent inserts.	Eliminates write contention. 16 threads write to 16 independent shards simultaneously.
MVCC via Atomic Sequence Numbers	Every write gets a monotonic `seq`. Reads specify `read_seq` — only see versions ≤ that seq.	True snapshot isolation. Readers never block writers. No read locks anywhere.
ArcSwap Topology	All read-visible state (memtable, SSTables, bloom filters, manifest) lives in a single `Arc<StorageRoots>`. Swapped atomically via `arc_swap::ArcSwap`.	Compaction and snapshot install publish new topology in one atomic pointer swap. Readers holding old `Arc` continue reading stale-but-consistent data. Zero stalls.
CRC32 on Every Heap Entry	`crc32fast::Hasher` computed at write time, verified at read time.	Silent data corruption is impossible. Bit-rot, torn writes, and disk errors are detected before data reaches the application.
WAL with Commit Markers	Each batch writes records + a `__COMMIT_MARKER__` record. Recovery replays only complete batches.	Partial batches from crashes are silently discarded. Proven by `test_torn_wal_record_is_rejected`.
Bloom Filters (per-SSTable)	Optimal bit count: `-n·ln(p) / ln²(2)`. FNV double-hashing: `h1 + i·h2`.	Avoids reading SSTables that definitely don't contain the key. False positive rate: 1%.
Positional I/O	Unix: `pwrite`/`pread`. Windows: `seek_write`/`seek_read` with short-read loop.	Concurrent batches write to non-overlapping heap regions without seeking. Cross-platform.
LZ4 Compression	Values ≥ 64 bytes compressed with `lz4_flex::compress_prepend_size`. Flag bit in length field.	Transparent compression. Small values (< 64 bytes) stored raw to avoid overhead.
Block Cache	`moka::sync::Cache` (concurrent LRU, 100K entries). Keyed by heap offset.	Hot data served from memory. No heap I/O for repeated reads of the same key.

SQL Engine — Parser (869) + Optimizer (840) + Volcano (582) + Executor (463)

Component	Implementation	Competitive comparison
Parser	Hand-written recursive descent. Handles SELECT/INSERT/UPDATE/DELETE/CREATE/DROP/EXPLAIN/JOIN/GROUP BY/ORDER BY/LIMIT/LIKE/IN/IS NULL/window functions.	Same approach as PostgreSQL's parser. No parser generator dependency.
Cost Model	`SEQ_SCAN=1.0/row`, `INDEX_SCAN=0.25/row`, `PK_LOOKUP=1.0`, `HASH_BUILD=2.0/row`, `HASH_PROBE=0.1/row`, `SORT=N·log₂N·2.0`.	Real cost constants, not arbitrary. Comparable to PostgreSQL's cost model (but simpler).
Statistics	`gather_stats()` scans actual table data. Per-column histograms with NDV (number of distinct values) and null fraction. Selectivity: equality=`1/NDV`, range=`1/3`, AND=multiplicative, OR=inclusion-exclusion.	More sophisticated than SQLite (which has no statistics). Simpler than PostgreSQL (which has multi-column stats).
Predicate Pushdown	`pushdown_join_predicates()` splits AND-conjuncts and routes single-table predicates to the correct join side.	Standard optimization. Reduces rows entering the join operator.
JOIN Reordering	Smaller table always becomes hash-build side. Cost = `build_cost + probe_cost + build_rows·2.0 + probe_rows·0.1`.	Correct for 2-table joins. Multi-table DP planner would be needed for complex queries.
Volcano Model	`RowIterator` trait with `next_row()`. Operators: SeqScan, PkLookup, Filter (O(1)), Project (O(1)), Limit (O(1)), Sort (O(N)), HashJoin (O(build)), Aggregate (O(N)).	Same architecture as PostgreSQL, Oracle, SQL Server. Pull-based streaming.
EXPLAIN ANALYZE	Collects `actual_rows` vs `estimated_rows` + wall-clock `actual_time_ms` per operator.	Same output format as PostgreSQL's EXPLAIN ANALYZE.
Plan Cache	LRU cache keyed by query string. `invalidate()` on DDL changes.	Avoids re-optimizing identical queries. Similar to PostgreSQL's plan cache.

Transaction Engine — `transaction.rs` (648 lines)

Feature	Implementation	Comparison
SSI (Serializable Snapshot Isolation)	Read snapshot at `BEGIN`. Write-set buffered until `COMMIT`. At commit: acquire global lock → check if any concurrent transaction wrote to our write-set keys after our snapshot → abort if conflict → commit atomically via `WriteBatch`.	Same isolation level as PostgreSQL's SERIALIZABLE. Stronger than MySQL's default (REPEATABLE READ).
Savepoints	`Savepoint` struct captures write_set + read_set snapshot. `ROLLBACK TO` restores to that point without aborting the entire transaction.	Same semantics as PostgreSQL's SAVEPOINT.
Conflict Detection	Write-write conflict: checks if key was modified between our `read_seq` and current `global_seq`. Read-write dependency tracking with bounded memory.	Correct SSI implementation. Detects dangerous structures (rw-antidependency cycles).
Timeouts	Configurable per-transaction timeout. Long-running transactions automatically aborted.	Prevents resource leaks from abandoned transactions.
2PC (Distributed)	Coordinator WAL persistence. Prepare → Vote → Commit/Abort across shards. Cross-shard Raft replication.	Proven by `test_2pc_cross_shard_with_raft_replication`.

Raft Consensus — 58 Tests, 3,660 Lines of Test Code

Test Category	Tests	What's proven
Core Replication	`test_3_node_log_replication`, `test_state_machine_apply`	All entries identical across 3 nodes. State machine produces same result on all replicas.
Leader Election	`test_leader_election_simulation`, `test_leader_election_under_load`	Exactly one leader emerges. Election works under concurrent write load.
Crash Recovery	`test_crash_recovery_persistence`, `test_log_consistency_after_crash`	Committed data survives node restart. Log is consistent after crash.
Network Partitions	`test_symmetric_partition_majority_progresses`, `test_asymmetric_partition_isolated_node`, `test_cascading_partitions_no_data_loss`	Majority side continues. Isolated node doesn't corrupt cluster. No data loss across cascading partitions.
Membership	`test_membership_add_node_catches_up`, `test_membership_remove_node`, `test_membership_scale_out_3_to_5`	Live scaling 3→5 nodes. Added node catches up. Removed node cleanly leaves.
Rolling Upgrades	`test_rolling_restart_no_data_loss`, `test_rolling_upgrade_continuous_writes`, `test_rolling_upgrade_read_availability`	Zero data loss during rolling restarts. Reads available throughout upgrade.
Distributed Transactions	`test_2pc_happy_path_commit`, `test_2pc_cross_shard_with_raft_replication`	2PC commit/abort works correctly. Cross-shard transactions replicated via Raft.
SSI Integration	`test_ssi_write_write_conflict`, `test_ssi_read_write_conflict`, `test_ssi_dangerous_structure_chain`	Serializable isolation across replicated nodes.
MVCC Consistency	`test_mvcc_logical_clock_ordering`, `test_ttl_consistency_across_replicas`	Logical clocks maintain ordering across nodes. TTL consistent across replicas.

📊 Test Suite — 290 Tests

$ cargo test
test result: ok. 290 passed; 0 failed

Suite	Tests	Focus
Storage Engine	76	WAL, crash recovery, CRC32, compaction
Raft Cluster	58	Replication, elections, partitions, upgrades
Operations	25	Health, metrics, backup, admin
Ops Maturity	24	Config, diagnostics, shutdown
Storage Perf	21	Throughput across storage paths
SQL Layer	18	Parser, JOINs, aggregates
Storage Correctness	14	Crash safety, MVCC, atomicity
Query + Optimizer	16	Planning, execution, cost estimation
Concurrent Stress	6	Multi-threaded contention
Anomaly Demos	4	Isolation level proofs
Other	28	Benchmarks, debugging

🔧 Quick Start

git clone https://github.com/SBALAVIGNESH123/OmniKV.git
cd omni_engine
cargo build --release
cargo run --release

  ╔════════════════════════════════════════════════════╗
  ║        ⚡ OmniKV v0.1.0                           ║
  ║  Embeddable · Distributed · Transactional KV      ║
  ╠════════════════════════════════════════════════════╣
  ║  HTTP/1.1 + HTTP/2 (TLS)  → 0.0.0.0:8443         ║
  ║  QUIC/HTTP3 (binary)      → 0.0.0.0:4433         ║
  ║  PostgreSQL Wire Protocol → 0.0.0.0:5433         ║
  ║  TCP Command Interface    → 0.0.0.0:8080         ║
  ╚════════════════════════════════════════════════════╝

Connect with psql

psql -h localhost -p 5433
CREATE TABLE users (id INTEGER, name TEXT, email TEXT);
INSERT INTO users VALUES (1, 'Alice', 'alice@example.com');
SELECT * FROM users WHERE id = 1;

REST API

curl -k https://localhost:8443/health
curl -k -X POST https://localhost:8443/kv -d '{"key":"hello","value":"world"}'
curl -k https://localhost:8443/kv/hello

Embedded library

use omni_engine::{OmniKV, WriteBatch};

let db = OmniKV::open("manifest.json", "data.wal").unwrap();
let mut batch = WriteBatch::new();
batch.set("user:1", r#"{"name":"Alice"}"#.into()).unwrap();
db.commit_batch(&batch).unwrap();

let val = db.find("user:1", db.get_seq()).unwrap();

📁 Project Structure

src/                                    ~12,500 lines
├── lib.rs              Storage engine core         2,038
├── sql.rs              SQL parser                    869
├── optimizer.rs        Cost-based optimizer           840
├── sql_exec.rs         SQL execution                  729
├── prepared.rs         Prepared statements            662
├── transaction.rs      SSI transaction engine         648
├── dist_txn.rs         2PC distributed txn            592
├── volcano.rs          Volcano iterator executor      582
├── secondary_index.rs  Secondary index engine         580
├── raft_storage.rs     Raft storage trait              536
├── schema.rs           DDL engine                     471
├── chaos.rs            Chaos testing framework        468
├── plan_exec.rs        Plan-driven executor           463
├── pgwire.rs           PostgreSQL wire protocol       430
├── api.rs              REST API (Axum)                300
└── ...                 hardening, QUIC, WAL, auth

tests/                                  ~7,800 lines
├── raft_cluster.rs     Raft cluster tests          3,660 (58 tests)
├── storage_tests.rs    Storage engine              1,547 (76 tests)
├── storage_perf.rs     Performance                   429 (21 tests)
├── operations.rs       Operational                   446 (25 tests)
├── storage_correctness Crash safety                  389 (14 tests)
├── sql_layer.rs        SQL integration               282 (18 tests)
└── ...                 stress, benchmarks

Total: ~20,000 lines of Rust · 290 tests

🗺️ Maturity

Stage	Status
✅ Storage Correctness	`████████████` 100%
✅ Internal Storage APIs	`████████████` 100%
✅ Raft Hardening	`████████████` 100%
✅ Transaction Engine	`████████████` 100%
✅ Query Engine	`████████████` 100%
✅ Operational Maturity	`████████████` 100%
🔨 Ecosystem	`████████░░░░` 65%

🐳 Docker

docker build -t omnikv .
docker run -p 8443:8443 -p 5433:5433 -p 4433:4433/udp omnikv
docker compose up  # 3-node cluster

📄 License

MIT

Built from scratch in Rust. Every byte is ours.
By Balavignesh

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
clients		clients
docs		docs
omni-client		omni-client
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
icon.svg		icon.svg
logo.svg		logo.svg
prometheus.yml		prometheus.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmniKV

The database engine that replaces 5 services with 1 binary.

🤯 Why OmniKV?

⚡ 30-Second Demo

🏗️ Architecture — Everything is Custom

🔬 Deep Feature Analysis

Storage Engine — `lib.rs` (2,038 lines)

SQL Engine — Parser (869) + Optimizer (840) + Volcano (582) + Executor (463)

Transaction Engine — `transaction.rs` (648 lines)

Raft Consensus — 58 Tests, 3,660 Lines of Test Code

📊 Test Suite — 290 Tests

🔧 Quick Start

Connect with psql

REST API

Embedded library

📁 Project Structure

🗺️ Maturity

🐳 Docker

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OmniKV

The database engine that replaces 5 services with 1 binary.

🤯 Why OmniKV?

⚡ 30-Second Demo

🏗️ Architecture — Everything is Custom

🔬 Deep Feature Analysis

Storage Engine — lib.rs (2,038 lines)

SQL Engine — Parser (869) + Optimizer (840) + Volcano (582) + Executor (463)

Transaction Engine — transaction.rs (648 lines)

Raft Consensus — 58 Tests, 3,660 Lines of Test Code

📊 Test Suite — 290 Tests

🔧 Quick Start

Connect with psql

REST API

Embedded library

📁 Project Structure

🗺️ Maturity

🐳 Docker

📄 License

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Storage Engine — `lib.rs` (2,038 lines)

Transaction Engine — `transaction.rs` (648 lines)

Packages