kvstore

A high-performance, in-memory key-value store written in C. Uses epoll (Linux) / kqueue (macOS) for non-blocking I/O, a sharded hash map for concurrent access, and LRU eviction with TTL expiry to bound memory.

Throughput :  200K – 1.5M ops/sec  (varies with pipelining depth)
Latency    :  sub-millisecond avg  (16 – 270 µs typical)
Connections:  1000+ concurrent     (tested 1024, zero errors)
Memory     :  bounded by LRU eviction within 200 bytes of configured limit

Architecture

                            ┌──────────────────────────────────┐
                            │           Main Thread            │
                            │                                  │
     Client TCP ──────────► │  poll() on listen socket         │
     connections             │  accept() new connections        │
                            │  round-robin to workers via pipe │
                            └──────┬───────┬───────┬───────────┘
                                   │ pipe  │ pipe  │ pipe
                              ┌────▼──┐ ┌──▼────┐ ┌▼──────┐
                              │Worker0│ │Worker1│ │Worker2│ ...  (N threads)
                              │       │ │       │ │       │
                              │kqueue/│ │kqueue/│ │kqueue/│
                              │epoll  │ │epoll  │ │epoll  │
                              │loop   │ │loop   │ │loop   │
                              └───┬───┘ └───┬───┘ └───┬───┘
                                  │         │         │
                                  ▼         ▼         ▼
                     ┌──────────────────────────────────────────┐
                     │         Shared KV Store (16 shards)      │
                     │                                          │
                     │  ┌────────┐ ┌────────┐     ┌────────┐   │
                     │  │Shard 0 │ │Shard 1 │ ... │Shard 15│   │
                     │  │rwlock  │ │rwlock  │     │rwlock  │   │
                     │  │hashtab │ │hashtab │     │hashtab │   │
                     │  │LRU list│ │LRU list│     │LRU list│   │
                     │  └────────┘ └────────┘     └────────┘   │
                     │                                          │
                     │  + Background TTL expiry thread          │
                     │  + Atomic memory counter for eviction    │
                     └──────────────────────────────────────────┘

Request Flow (SET example)

Client                     Main Thread              Worker 2              Store (Shard 7)
  │                            │                        │                       │
  │── TCP connect ────────────►│                        │                       │
  │                            │── write(pipe, fd) ────►│                       │
  │                            │                        │── ev_add_read(fd)     │
  │                            │                        │                       │
  │── "SET foo bar EX 60\r\n"─────────────────────────►│                       │
  │                            │                        │── read() into rbuf    │
  │                            │                        │── parse command       │
  │                            │                        │── hash("foo")→shard 7 │
  │                            │                        │── wrlock(shard 7) ───►│
  │                            │                        │                       │── find bucket
  │                            │                        │                       │── insert entry
  │                            │                        │                       │── LRU push front
  │                            │                        │── unlock(shard 7) ◄───│
  │                            │                        │── write "+OK\r\n"     │
  │◄── "+OK\r\n" ──────────────────────────────────────│                       │

Shard Internals

Shard N:
  rwlock ─── rdlock for GET (concurrent reads)
         └── wrlock for SET/DEL/evict (exclusive)

  Hash Table (chaining):
    buckets[0] ──► [key:abc|val:123] ──► [key:xyz|val:789] ──► NULL
    buckets[1] ──► NULL
    buckets[2] ──► [key:foo|val:bar] ──► NULL
    ...
    buckets[4095]

  LRU Doubly-Linked List:
    head (MRU) ◄──► [key:foo] ◄──► [key:abc] ◄──► [key:xyz] ──► tail (LRU)
                                                                    ▲
                                                            evict from here

Memory Eviction Strategy

                  used_memory > max_memory ?
                           │
                    ┌──────┴──────┐
                    │ YES         │ NO
                    ▼             └──► insert normally
              Evict LRU from
              current shard
                    │
              Still over limit?
                    │
             ┌──────┴──────┐
             │ YES         │ NO
             ▼             └──► insert normally
        Evict LRU from
        other shards
        (round-robin)
                    │
                    ▼
              Insert new entry

Prerequisites

Platform	Requirement
macOS	Xcode Command Line Tools (`xcode-select --install`)
Ubuntu/Debian	`sudo apt install build-essential`
RHEL/Fedora	`sudo dnf install gcc make`

No external libraries. Only libc and pthreads.

Build

git clone <repo-url> && cd kvstore
make            # builds kv-server and kv-bench

Verify:

make smoke      # starts server, sends PING, checks +PONG, stops server

Run full test (SET/GET/DEL + TTL expiry + benchmark):

make test

Usage

Start the server

./kv-server [options]

Flag	Default	Description
`-p <port>`	6380	TCP listen port
`-w <N>`	4	Number of worker threads
`-m <MB>`	256	Max memory in megabytes (LRU evicts beyond this)
`-h`	—	Show help

Example:

./kv-server -p 6380 -w 8 -m 512
# kvstore listening on :6380  workers=8  shards=16  max_mem=512MB

Stop with Ctrl-C (sends SIGINT, clean shutdown).

Talk to it

Any TCP client works. Using nc:

# SET a key
printf "SET mykey myvalue\r\n" | nc -w 1 127.0.0.1 6380
# +OK

# SET with TTL (expires in 60 seconds)
printf "SET session abc123 EX 60\r\n" | nc -w 1 127.0.0.1 6380
# +OK

# GET a key
printf "GET mykey\r\n" | nc -w 1 127.0.0.1 6380
# $7
# myvalue

# GET a missing key
printf "GET nosuchkey\r\n" | nc -w 1 127.0.0.1 6380
# $-1

# DELETE a key
printf "DEL mykey\r\n" | nc -w 1 127.0.0.1 6380
# :1

# Server stats
printf "INFO\r\n" | nc -w 1 127.0.0.1 6380
# +INFO keys=1234 mem=56789 max_mem=268435456 gets=... sets=... ...

Protocol Reference

Text-based, one command per line, terminated by \r\n.

Commands

Command	Response	Description
`SET <key> <value>\r\n`	`+OK\r\n`	Store key-value pair
`SET <key> <value> EX <seconds>\r\n`	`+OK\r\n`	Store with TTL
`GET <key>\r\n`	`$<len>\r\n<value>\r\n`	Retrieve value
`GET <key>\r\n` (miss)	`$-1\r\n`	Key not found or expired
`DEL <key>\r\n`	`:<0\|1>\r\n`	Delete key (1=deleted, 0=not found)
`PING\r\n`	`+PONG\r\n`	Health check
`INFO\r\n`	`+INFO key=val ...\r\n`	Server statistics
`QUIT\r\n`	`+OK\r\n`	Close connection

INFO fields

Field	Meaning
`keys`	Total keys currently stored
`mem`	Bytes used by stored entries
`max_mem`	Configured memory limit
`gets`	Total GET operations
`sets`	Total SET operations
`dels`	Total DEL operations
`hits`	GETs that found a key
`misses`	GETs that missed
`evictions`	Keys evicted by LRU
`expired`	Keys expired by TTL

Limitations

Keys and values cannot contain spaces or newlines (space-delimited protocol).
Max key size: 256 bytes. Max value size: bounded by read buffer (4 KB).
Protocol is not binary-safe (unlike Redis RESP).

Benchmarking

Built-in benchmark tool

./kv-bench [options]

Flag	Default	Description
`-h <host>`	127.0.0.1	Server host
`-p <port>`	6380	Server port
`-n <total>`	1000000	Total number of requests
`-c <conns>`	50	Concurrent connections (one thread each)
`-P <depth>`	16	Pipeline depth (requests batched before reading)
`-d <bytes>`	64	Value size in bytes
`-r <ratio>`	0.50	SET ratio (0.0 = all GETs, 1.0 = all SETs)
`-H`	—	Show help

Example runs

# Quick smoke benchmark
make bench

# Realistic: no pipelining, 50 connections
./kv-bench -p 6380 -n 200000 -c 50 -P 1

# High throughput: pipeline 16, 50 connections
./kv-bench -p 6380 -n 1000000 -c 50 -P 16

# Stress test: 1000+ connections
./kv-bench -p 6380 -n 500000 -c 1024 -P 8

# Write-heavy workload
./kv-bench -p 6380 -n 500000 -c 50 -P 4 -r 0.9

# Large values
./kv-bench -p 6380 -n 100000 -c 20 -P 4 -d 512

Reference numbers (MacBook, local loopback)

Configuration	Throughput	Avg Latency	p99 Latency
50 conns, pipeline=1	~180K ops/s	270 µs	1.2 ms
50 conns, pipeline=4	~360K ops/s	124 µs	722 µs
50 conns, pipeline=16	~1.5M ops/s	31 µs	77 µs
1024 conns, pipeline=4	~600K ops/s	1.5 ms	3.3 ms

Pipeline=1 is pure round-trip (one request, wait for response, repeat). Higher pipeline depths batch requests, amortizing syscall overhead.

Multi-Node Cluster Testing

For testing at scale on multiple machines (e.g., CloudLabs Ubuntu nodes).

Local multi-instance (test on one machine)

# Start 3 server instances on ports 6380, 6381, 6382 and benchmark each
./scripts/run_cluster.sh local 3

Remote deployment (CloudLabs / any SSH-accessible Ubuntu machines)

1. Create a nodes file:

cat > nodes.txt << 'EOF'
user@10.0.0.1
user@10.0.0.2
user@10.0.0.3
EOF

2. Deploy, build, and start on all nodes:

./scripts/run_cluster.sh deploy nodes.txt

This will:

scp the source files to each node
make clean all on each node (compiles natively with epoll on Linux)
Start kv-server on each node (port 6380, 4 workers, 512 MB)

3. Benchmark all nodes:

./scripts/run_cluster.sh bench nodes.txt 1000000 50
#                                        ^^^^^^^ ^^
#                                        requests connections per node

4. Stop all servers:

./scripts/run_cluster.sh stop nodes.txt

Project Structure

kvstore/
├── src/
│   └── server.c            # Complete server (927 lines)
│                            #   - Event loop (epoll/kqueue abstraction)
│                            #   - Sharded hash map with per-shard rwlocks
│                            #   - LRU eviction + TTL expiry
│                            #   - Multi-threaded worker pool
│                            #   - Protocol parser
│                            #   - Connection management
├── tools/
│   └── benchmark.c          # Load testing tool (377 lines)
│                            #   - Multi-threaded pipelined client
│                            #   - Latency histogram with percentiles
│                            #   - Configurable workload mix
├── scripts/
│   └── run_cluster.sh       # Multi-node cluster deployment + benchmarking
├── Makefile                 # Build targets: all, clean, smoke, test, bench
└── README.md

Design Decisions

Why sharded hash map (not a single lock)?

A single global mutex means only one thread can access the store at a time. With 16 shards, the probability of two threads contending on the same lock is 1/16. Each shard has its own pthread_rwlock_t: GETs take a read lock (multiple concurrent readers), SETs take a write lock (exclusive).

Why epoll/kqueue (not poll/select)?

select() scans all file descriptors every call — O(n) per wakeup. epoll/kqueue maintain a kernel-side interest set and only return ready descriptors — O(1) per ready fd. This matters at 1000+ connections.

Why per-connection fixed buffers (not dynamic)?

Each connection_t embeds a 4 KB read buffer and 64 KB write buffer directly in the struct. No malloc/free during request handling. The connection struct is allocated once on accept() and freed once on close. This avoids heap fragmentation and allocator overhead on the hot path.

Why pipe-based fd distribution (not shared accept)?

The main thread accept()s connections and writes the fd to a worker's pipe. Each worker has its own event loop monitoring its own pipe. This avoids thundering herd (multiple threads waking on the same listen socket) and gives deterministic round-robin load distribution.

Why doubly-linked list for LRU (not approximate)?

Redis uses approximate LRU (random sampling) to save memory. Our LRU is exact: every access moves the entry to the list head. This costs two extra pointers (16 bytes) per entry but guarantees the true least-recently-used entry is always evicted first. For a from-scratch implementation, exact LRU is more correct and simpler to reason about.

Why FNV-1a hash?

Fast (no multiplications heavier than 32-bit), good distribution, and widely used for hash tables. We hash once and split the 32-bit result: low 4 bits select the shard, remaining 28 bits select the bucket within the shard. This avoids hashing the key twice.

Tuning Guide

Parameter	Effect	Recommendation
`-w` workers	More workers = more CPU cores used	Set to number of CPU cores
`-m` memory	LRU eviction threshold	Set to available RAM minus OS overhead
`NUM_SHARDS`	Lock contention reduction	16 is good up to ~16 cores. For 32+ cores, recompile with 32 or 64. Must be power of 2.
`INIT_BUCKETS`	Initial hash table size per shard	4096 is fine for most workloads. If you store millions of keys, increase to 65536 to reduce chain length.
`RBUF_SIZE`	Max command line length	4096 handles keys+values up to ~4 KB. Increase if you need larger values.
`WBUF_SIZE`	Write buffer per connection	64 KB handles batched pipelined responses.
`BACKLOG`	TCP accept queue depth	1024 handles burst connection storms.

All parameters except the flags are compile-time #defines in src/server.c (lines 57–70). Change and rebuild with make clean all.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
scripts		scripts
src		src
tools		tools
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

kvstore

Architecture

Request Flow (SET example)

Shard Internals

Memory Eviction Strategy

Prerequisites

Build

Usage

Start the server

Talk to it

Protocol Reference

Commands

INFO fields

Limitations

Benchmarking

Built-in benchmark tool

Example runs

Reference numbers (MacBook, local loopback)

Multi-Node Cluster Testing

Local multi-instance (test on one machine)

Remote deployment (CloudLabs / any SSH-accessible Ubuntu machines)

Project Structure

Design Decisions

Why sharded hash map (not a single lock)?

Why epoll/kqueue (not poll/select)?

Why per-connection fixed buffers (not dynamic)?

Why pipe-based fd distribution (not shared accept)?

Why doubly-linked list for LRU (not approximate)?

Why FNV-1a hash?

Tuning Guide

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages