A high-performance, in-memory key-value store written in C. Uses epoll (Linux) / kqueue (macOS) for non-blocking I/O, a sharded hash map for concurrent access, and LRU eviction with TTL expiry to bound memory.
Throughput : 200K – 1.5M ops/sec (varies with pipelining depth)
Latency : sub-millisecond avg (16 – 270 µs typical)
Connections: 1000+ concurrent (tested 1024, zero errors)
Memory : bounded by LRU eviction within 200 bytes of configured limit
┌──────────────────────────────────┐
│ Main Thread │
│ │
Client TCP ──────────► │ poll() on listen socket │
connections │ accept() new connections │
│ round-robin to workers via pipe │
└──────┬───────┬───────┬───────────┘
│ pipe │ pipe │ pipe
┌────▼──┐ ┌──▼────┐ ┌▼──────┐
│Worker0│ │Worker1│ │Worker2│ ... (N threads)
│ │ │ │ │ │
│kqueue/│ │kqueue/│ │kqueue/│
│epoll │ │epoll │ │epoll │
│loop │ │loop │ │loop │
└───┬───┘ └───┬───┘ └───┬───┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────┐
│ Shared KV Store (16 shards) │
│ │
│ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │Shard 0 │ │Shard 1 │ ... │Shard 15│ │
│ │rwlock │ │rwlock │ │rwlock │ │
│ │hashtab │ │hashtab │ │hashtab │ │
│ │LRU list│ │LRU list│ │LRU list│ │
│ └────────┘ └────────┘ └────────┘ │
│ │
│ + Background TTL expiry thread │
│ + Atomic memory counter for eviction │
└──────────────────────────────────────────┘
Client Main Thread Worker 2 Store (Shard 7)
│ │ │ │
│── TCP connect ────────────►│ │ │
│ │── write(pipe, fd) ────►│ │
│ │ │── ev_add_read(fd) │
│ │ │ │
│── "SET foo bar EX 60\r\n"─────────────────────────►│ │
│ │ │── read() into rbuf │
│ │ │── parse command │
│ │ │── hash("foo")→shard 7 │
│ │ │── wrlock(shard 7) ───►│
│ │ │ │── find bucket
│ │ │ │── insert entry
│ │ │ │── LRU push front
│ │ │── unlock(shard 7) ◄───│
│ │ │── write "+OK\r\n" │
│◄── "+OK\r\n" ──────────────────────────────────────│ │
Shard N:
rwlock ─── rdlock for GET (concurrent reads)
└── wrlock for SET/DEL/evict (exclusive)
Hash Table (chaining):
buckets[0] ──► [key:abc|val:123] ──► [key:xyz|val:789] ──► NULL
buckets[1] ──► NULL
buckets[2] ──► [key:foo|val:bar] ──► NULL
...
buckets[4095]
LRU Doubly-Linked List:
head (MRU) ◄──► [key:foo] ◄──► [key:abc] ◄──► [key:xyz] ──► tail (LRU)
▲
evict from here
used_memory > max_memory ?
│
┌──────┴──────┐
│ YES │ NO
▼ └──► insert normally
Evict LRU from
current shard
│
Still over limit?
│
┌──────┴──────┐
│ YES │ NO
▼ └──► insert normally
Evict LRU from
other shards
(round-robin)
│
▼
Insert new entry
| Platform | Requirement |
|---|---|
| macOS | Xcode Command Line Tools (xcode-select --install) |
| Ubuntu/Debian | sudo apt install build-essential |
| RHEL/Fedora | sudo dnf install gcc make |
No external libraries. Only libc and pthreads.
git clone <repo-url> && cd kvstore
make # builds kv-server and kv-benchVerify:
make smoke # starts server, sends PING, checks +PONG, stops serverRun full test (SET/GET/DEL + TTL expiry + benchmark):
make test./kv-server [options]| Flag | Default | Description |
|---|---|---|
-p <port> |
6380 | TCP listen port |
-w <N> |
4 | Number of worker threads |
-m <MB> |
256 | Max memory in megabytes (LRU evicts beyond this) |
-h |
— | Show help |
Example:
./kv-server -p 6380 -w 8 -m 512
# kvstore listening on :6380 workers=8 shards=16 max_mem=512MBStop with Ctrl-C (sends SIGINT, clean shutdown).
Any TCP client works. Using nc:
# SET a key
printf "SET mykey myvalue\r\n" | nc -w 1 127.0.0.1 6380
# +OK
# SET with TTL (expires in 60 seconds)
printf "SET session abc123 EX 60\r\n" | nc -w 1 127.0.0.1 6380
# +OK
# GET a key
printf "GET mykey\r\n" | nc -w 1 127.0.0.1 6380
# $7
# myvalue
# GET a missing key
printf "GET nosuchkey\r\n" | nc -w 1 127.0.0.1 6380
# $-1
# DELETE a key
printf "DEL mykey\r\n" | nc -w 1 127.0.0.1 6380
# :1
# Server stats
printf "INFO\r\n" | nc -w 1 127.0.0.1 6380
# +INFO keys=1234 mem=56789 max_mem=268435456 gets=... sets=... ...Text-based, one command per line, terminated by \r\n.
| Command | Response | Description |
|---|---|---|
SET <key> <value>\r\n |
+OK\r\n |
Store key-value pair |
SET <key> <value> EX <seconds>\r\n |
+OK\r\n |
Store with TTL |
GET <key>\r\n |
$<len>\r\n<value>\r\n |
Retrieve value |
GET <key>\r\n (miss) |
$-1\r\n |
Key not found or expired |
DEL <key>\r\n |
:<0|1>\r\n |
Delete key (1=deleted, 0=not found) |
PING\r\n |
+PONG\r\n |
Health check |
INFO\r\n |
+INFO key=val ...\r\n |
Server statistics |
QUIT\r\n |
+OK\r\n |
Close connection |
| Field | Meaning |
|---|---|
keys |
Total keys currently stored |
mem |
Bytes used by stored entries |
max_mem |
Configured memory limit |
gets |
Total GET operations |
sets |
Total SET operations |
dels |
Total DEL operations |
hits |
GETs that found a key |
misses |
GETs that missed |
evictions |
Keys evicted by LRU |
expired |
Keys expired by TTL |
- Keys and values cannot contain spaces or newlines (space-delimited protocol).
- Max key size: 256 bytes. Max value size: bounded by read buffer (4 KB).
- Protocol is not binary-safe (unlike Redis RESP).
./kv-bench [options]| Flag | Default | Description |
|---|---|---|
-h <host> |
127.0.0.1 | Server host |
-p <port> |
6380 | Server port |
-n <total> |
1000000 | Total number of requests |
-c <conns> |
50 | Concurrent connections (one thread each) |
-P <depth> |
16 | Pipeline depth (requests batched before reading) |
-d <bytes> |
64 | Value size in bytes |
-r <ratio> |
0.50 | SET ratio (0.0 = all GETs, 1.0 = all SETs) |
-H |
— | Show help |
# Quick smoke benchmark
make bench
# Realistic: no pipelining, 50 connections
./kv-bench -p 6380 -n 200000 -c 50 -P 1
# High throughput: pipeline 16, 50 connections
./kv-bench -p 6380 -n 1000000 -c 50 -P 16
# Stress test: 1000+ connections
./kv-bench -p 6380 -n 500000 -c 1024 -P 8
# Write-heavy workload
./kv-bench -p 6380 -n 500000 -c 50 -P 4 -r 0.9
# Large values
./kv-bench -p 6380 -n 100000 -c 20 -P 4 -d 512| Configuration | Throughput | Avg Latency | p99 Latency |
|---|---|---|---|
| 50 conns, pipeline=1 | ~180K ops/s | 270 µs | 1.2 ms |
| 50 conns, pipeline=4 | ~360K ops/s | 124 µs | 722 µs |
| 50 conns, pipeline=16 | ~1.5M ops/s | 31 µs | 77 µs |
| 1024 conns, pipeline=4 | ~600K ops/s | 1.5 ms | 3.3 ms |
Pipeline=1 is pure round-trip (one request, wait for response, repeat). Higher pipeline depths batch requests, amortizing syscall overhead.
For testing at scale on multiple machines (e.g., CloudLabs Ubuntu nodes).
# Start 3 server instances on ports 6380, 6381, 6382 and benchmark each
./scripts/run_cluster.sh local 31. Create a nodes file:
cat > nodes.txt << 'EOF'
user@10.0.0.1
user@10.0.0.2
user@10.0.0.3
EOF2. Deploy, build, and start on all nodes:
./scripts/run_cluster.sh deploy nodes.txtThis will:
scpthe source files to each nodemake clean allon each node (compiles natively with epoll on Linux)- Start
kv-serveron each node (port 6380, 4 workers, 512 MB)
3. Benchmark all nodes:
./scripts/run_cluster.sh bench nodes.txt 1000000 50
# ^^^^^^^ ^^
# requests connections per node4. Stop all servers:
./scripts/run_cluster.sh stop nodes.txtkvstore/
├── src/
│ └── server.c # Complete server (927 lines)
│ # - Event loop (epoll/kqueue abstraction)
│ # - Sharded hash map with per-shard rwlocks
│ # - LRU eviction + TTL expiry
│ # - Multi-threaded worker pool
│ # - Protocol parser
│ # - Connection management
├── tools/
│ └── benchmark.c # Load testing tool (377 lines)
│ # - Multi-threaded pipelined client
│ # - Latency histogram with percentiles
│ # - Configurable workload mix
├── scripts/
│ └── run_cluster.sh # Multi-node cluster deployment + benchmarking
├── Makefile # Build targets: all, clean, smoke, test, bench
└── README.md
A single global mutex means only one thread can access the store at a time.
With 16 shards, the probability of two threads contending on the same lock
is 1/16. Each shard has its own pthread_rwlock_t: GETs take a read lock
(multiple concurrent readers), SETs take a write lock (exclusive).
select() scans all file descriptors every call — O(n) per wakeup.
epoll/kqueue maintain a kernel-side interest set and only return
ready descriptors — O(1) per ready fd. This matters at 1000+ connections.
Each connection_t embeds a 4 KB read buffer and 64 KB write buffer directly
in the struct. No malloc/free during request handling. The connection
struct is allocated once on accept() and freed once on close. This avoids
heap fragmentation and allocator overhead on the hot path.
The main thread accept()s connections and writes the fd to a worker's pipe.
Each worker has its own event loop monitoring its own pipe. This avoids
thundering herd (multiple threads waking on the same listen socket) and gives
deterministic round-robin load distribution.
Redis uses approximate LRU (random sampling) to save memory. Our LRU is exact: every access moves the entry to the list head. This costs two extra pointers (16 bytes) per entry but guarantees the true least-recently-used entry is always evicted first. For a from-scratch implementation, exact LRU is more correct and simpler to reason about.
Fast (no multiplications heavier than 32-bit), good distribution, and widely used for hash tables. We hash once and split the 32-bit result: low 4 bits select the shard, remaining 28 bits select the bucket within the shard. This avoids hashing the key twice.
| Parameter | Effect | Recommendation |
|---|---|---|
-w workers |
More workers = more CPU cores used | Set to number of CPU cores |
-m memory |
LRU eviction threshold | Set to available RAM minus OS overhead |
NUM_SHARDS |
Lock contention reduction | 16 is good up to ~16 cores. For 32+ cores, recompile with 32 or 64. Must be power of 2. |
INIT_BUCKETS |
Initial hash table size per shard | 4096 is fine for most workloads. If you store millions of keys, increase to 65536 to reduce chain length. |
RBUF_SIZE |
Max command line length | 4096 handles keys+values up to ~4 KB. Increase if you need larger values. |
WBUF_SIZE |
Write buffer per connection | 64 KB handles batched pipelined responses. |
BACKLOG |
TCP accept queue depth | 1024 handles burst connection storms. |
All parameters except the flags are compile-time #defines in src/server.c (lines 57–70).
Change and rebuild with make clean all.