Skip to content

Devesh-Maheshwari/KV-store

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kvstore

A high-performance, in-memory key-value store written in C. Uses epoll (Linux) / kqueue (macOS) for non-blocking I/O, a sharded hash map for concurrent access, and LRU eviction with TTL expiry to bound memory.

Throughput :  200K – 1.5M ops/sec  (varies with pipelining depth)
Latency    :  sub-millisecond avg  (16 – 270 µs typical)
Connections:  1000+ concurrent     (tested 1024, zero errors)
Memory     :  bounded by LRU eviction within 200 bytes of configured limit

Architecture

                            ┌──────────────────────────────────┐
                            │           Main Thread            │
                            │                                  │
     Client TCP ──────────► │  poll() on listen socket         │
     connections             │  accept() new connections        │
                            │  round-robin to workers via pipe │
                            └──────┬───────┬───────┬───────────┘
                                   │ pipe  │ pipe  │ pipe
                              ┌────▼──┐ ┌──▼────┐ ┌▼──────┐
                              │Worker0│ │Worker1│ │Worker2│ ...  (N threads)
                              │       │ │       │ │       │
                              │kqueue/│ │kqueue/│ │kqueue/│
                              │epoll  │ │epoll  │ │epoll  │
                              │loop   │ │loop   │ │loop   │
                              └───┬───┘ └───┬───┘ └───┬───┘
                                  │         │         │
                                  ▼         ▼         ▼
                     ┌──────────────────────────────────────────┐
                     │         Shared KV Store (16 shards)      │
                     │                                          │
                     │  ┌────────┐ ┌────────┐     ┌────────┐   │
                     │  │Shard 0 │ │Shard 1 │ ... │Shard 15│   │
                     │  │rwlock  │ │rwlock  │     │rwlock  │   │
                     │  │hashtab │ │hashtab │     │hashtab │   │
                     │  │LRU list│ │LRU list│     │LRU list│   │
                     │  └────────┘ └────────┘     └────────┘   │
                     │                                          │
                     │  + Background TTL expiry thread          │
                     │  + Atomic memory counter for eviction    │
                     └──────────────────────────────────────────┘

Request Flow (SET example)

Client                     Main Thread              Worker 2              Store (Shard 7)
  │                            │                        │                       │
  │── TCP connect ────────────►│                        │                       │
  │                            │── write(pipe, fd) ────►│                       │
  │                            │                        │── ev_add_read(fd)     │
  │                            │                        │                       │
  │── "SET foo bar EX 60\r\n"─────────────────────────►│                       │
  │                            │                        │── read() into rbuf    │
  │                            │                        │── parse command       │
  │                            │                        │── hash("foo")→shard 7 │
  │                            │                        │── wrlock(shard 7) ───►│
  │                            │                        │                       │── find bucket
  │                            │                        │                       │── insert entry
  │                            │                        │                       │── LRU push front
  │                            │                        │── unlock(shard 7) ◄───│
  │                            │                        │── write "+OK\r\n"     │
  │◄── "+OK\r\n" ──────────────────────────────────────│                       │

Shard Internals

Shard N:
  rwlock ─── rdlock for GET (concurrent reads)
         └── wrlock for SET/DEL/evict (exclusive)

  Hash Table (chaining):
    buckets[0] ──► [key:abc|val:123] ──► [key:xyz|val:789] ──► NULL
    buckets[1] ──► NULL
    buckets[2] ──► [key:foo|val:bar] ──► NULL
    ...
    buckets[4095]

  LRU Doubly-Linked List:
    head (MRU) ◄──► [key:foo] ◄──► [key:abc] ◄──► [key:xyz] ──► tail (LRU)
                                                                    ▲
                                                            evict from here

Memory Eviction Strategy

                  used_memory > max_memory ?
                           │
                    ┌──────┴──────┐
                    │ YES         │ NO
                    ▼             └──► insert normally
              Evict LRU from
              current shard
                    │
              Still over limit?
                    │
             ┌──────┴──────┐
             │ YES         │ NO
             ▼             └──► insert normally
        Evict LRU from
        other shards
        (round-robin)
                    │
                    ▼
              Insert new entry

Prerequisites

Platform Requirement
macOS Xcode Command Line Tools (xcode-select --install)
Ubuntu/Debian sudo apt install build-essential
RHEL/Fedora sudo dnf install gcc make

No external libraries. Only libc and pthreads.


Build

git clone <repo-url> && cd kvstore
make            # builds kv-server and kv-bench

Verify:

make smoke      # starts server, sends PING, checks +PONG, stops server

Run full test (SET/GET/DEL + TTL expiry + benchmark):

make test

Usage

Start the server

./kv-server [options]
Flag Default Description
-p <port> 6380 TCP listen port
-w <N> 4 Number of worker threads
-m <MB> 256 Max memory in megabytes (LRU evicts beyond this)
-h Show help

Example:

./kv-server -p 6380 -w 8 -m 512
# kvstore listening on :6380  workers=8  shards=16  max_mem=512MB

Stop with Ctrl-C (sends SIGINT, clean shutdown).

Talk to it

Any TCP client works. Using nc:

# SET a key
printf "SET mykey myvalue\r\n" | nc -w 1 127.0.0.1 6380
# +OK

# SET with TTL (expires in 60 seconds)
printf "SET session abc123 EX 60\r\n" | nc -w 1 127.0.0.1 6380
# +OK

# GET a key
printf "GET mykey\r\n" | nc -w 1 127.0.0.1 6380
# $7
# myvalue

# GET a missing key
printf "GET nosuchkey\r\n" | nc -w 1 127.0.0.1 6380
# $-1

# DELETE a key
printf "DEL mykey\r\n" | nc -w 1 127.0.0.1 6380
# :1

# Server stats
printf "INFO\r\n" | nc -w 1 127.0.0.1 6380
# +INFO keys=1234 mem=56789 max_mem=268435456 gets=... sets=... ...

Protocol Reference

Text-based, one command per line, terminated by \r\n.

Commands

Command Response Description
SET <key> <value>\r\n +OK\r\n Store key-value pair
SET <key> <value> EX <seconds>\r\n +OK\r\n Store with TTL
GET <key>\r\n $<len>\r\n<value>\r\n Retrieve value
GET <key>\r\n (miss) $-1\r\n Key not found or expired
DEL <key>\r\n :<0|1>\r\n Delete key (1=deleted, 0=not found)
PING\r\n +PONG\r\n Health check
INFO\r\n +INFO key=val ...\r\n Server statistics
QUIT\r\n +OK\r\n Close connection

INFO fields

Field Meaning
keys Total keys currently stored
mem Bytes used by stored entries
max_mem Configured memory limit
gets Total GET operations
sets Total SET operations
dels Total DEL operations
hits GETs that found a key
misses GETs that missed
evictions Keys evicted by LRU
expired Keys expired by TTL

Limitations

  • Keys and values cannot contain spaces or newlines (space-delimited protocol).
  • Max key size: 256 bytes. Max value size: bounded by read buffer (4 KB).
  • Protocol is not binary-safe (unlike Redis RESP).

Benchmarking

Built-in benchmark tool

./kv-bench [options]
Flag Default Description
-h <host> 127.0.0.1 Server host
-p <port> 6380 Server port
-n <total> 1000000 Total number of requests
-c <conns> 50 Concurrent connections (one thread each)
-P <depth> 16 Pipeline depth (requests batched before reading)
-d <bytes> 64 Value size in bytes
-r <ratio> 0.50 SET ratio (0.0 = all GETs, 1.0 = all SETs)
-H Show help

Example runs

# Quick smoke benchmark
make bench

# Realistic: no pipelining, 50 connections
./kv-bench -p 6380 -n 200000 -c 50 -P 1

# High throughput: pipeline 16, 50 connections
./kv-bench -p 6380 -n 1000000 -c 50 -P 16

# Stress test: 1000+ connections
./kv-bench -p 6380 -n 500000 -c 1024 -P 8

# Write-heavy workload
./kv-bench -p 6380 -n 500000 -c 50 -P 4 -r 0.9

# Large values
./kv-bench -p 6380 -n 100000 -c 20 -P 4 -d 512

Reference numbers (MacBook, local loopback)

Configuration Throughput Avg Latency p99 Latency
50 conns, pipeline=1 ~180K ops/s 270 µs 1.2 ms
50 conns, pipeline=4 ~360K ops/s 124 µs 722 µs
50 conns, pipeline=16 ~1.5M ops/s 31 µs 77 µs
1024 conns, pipeline=4 ~600K ops/s 1.5 ms 3.3 ms

Pipeline=1 is pure round-trip (one request, wait for response, repeat). Higher pipeline depths batch requests, amortizing syscall overhead.


Multi-Node Cluster Testing

For testing at scale on multiple machines (e.g., CloudLabs Ubuntu nodes).

Local multi-instance (test on one machine)

# Start 3 server instances on ports 6380, 6381, 6382 and benchmark each
./scripts/run_cluster.sh local 3

Remote deployment (CloudLabs / any SSH-accessible Ubuntu machines)

1. Create a nodes file:

cat > nodes.txt << 'EOF'
user@10.0.0.1
user@10.0.0.2
user@10.0.0.3
EOF

2. Deploy, build, and start on all nodes:

./scripts/run_cluster.sh deploy nodes.txt

This will:

  • scp the source files to each node
  • make clean all on each node (compiles natively with epoll on Linux)
  • Start kv-server on each node (port 6380, 4 workers, 512 MB)

3. Benchmark all nodes:

./scripts/run_cluster.sh bench nodes.txt 1000000 50
#                                        ^^^^^^^ ^^
#                                        requests connections per node

4. Stop all servers:

./scripts/run_cluster.sh stop nodes.txt

Project Structure

kvstore/
├── src/
│   └── server.c            # Complete server (927 lines)
│                            #   - Event loop (epoll/kqueue abstraction)
│                            #   - Sharded hash map with per-shard rwlocks
│                            #   - LRU eviction + TTL expiry
│                            #   - Multi-threaded worker pool
│                            #   - Protocol parser
│                            #   - Connection management
├── tools/
│   └── benchmark.c          # Load testing tool (377 lines)
│                            #   - Multi-threaded pipelined client
│                            #   - Latency histogram with percentiles
│                            #   - Configurable workload mix
├── scripts/
│   └── run_cluster.sh       # Multi-node cluster deployment + benchmarking
├── Makefile                 # Build targets: all, clean, smoke, test, bench
└── README.md

Design Decisions

Why sharded hash map (not a single lock)?

A single global mutex means only one thread can access the store at a time. With 16 shards, the probability of two threads contending on the same lock is 1/16. Each shard has its own pthread_rwlock_t: GETs take a read lock (multiple concurrent readers), SETs take a write lock (exclusive).

Why epoll/kqueue (not poll/select)?

select() scans all file descriptors every call — O(n) per wakeup. epoll/kqueue maintain a kernel-side interest set and only return ready descriptors — O(1) per ready fd. This matters at 1000+ connections.

Why per-connection fixed buffers (not dynamic)?

Each connection_t embeds a 4 KB read buffer and 64 KB write buffer directly in the struct. No malloc/free during request handling. The connection struct is allocated once on accept() and freed once on close. This avoids heap fragmentation and allocator overhead on the hot path.

Why pipe-based fd distribution (not shared accept)?

The main thread accept()s connections and writes the fd to a worker's pipe. Each worker has its own event loop monitoring its own pipe. This avoids thundering herd (multiple threads waking on the same listen socket) and gives deterministic round-robin load distribution.

Why doubly-linked list for LRU (not approximate)?

Redis uses approximate LRU (random sampling) to save memory. Our LRU is exact: every access moves the entry to the list head. This costs two extra pointers (16 bytes) per entry but guarantees the true least-recently-used entry is always evicted first. For a from-scratch implementation, exact LRU is more correct and simpler to reason about.

Why FNV-1a hash?

Fast (no multiplications heavier than 32-bit), good distribution, and widely used for hash tables. We hash once and split the 32-bit result: low 4 bits select the shard, remaining 28 bits select the bucket within the shard. This avoids hashing the key twice.


Tuning Guide

Parameter Effect Recommendation
-w workers More workers = more CPU cores used Set to number of CPU cores
-m memory LRU eviction threshold Set to available RAM minus OS overhead
NUM_SHARDS Lock contention reduction 16 is good up to ~16 cores. For 32+ cores, recompile with 32 or 64. Must be power of 2.
INIT_BUCKETS Initial hash table size per shard 4096 is fine for most workloads. If you store millions of keys, increase to 65536 to reduce chain length.
RBUF_SIZE Max command line length 4096 handles keys+values up to ~4 KB. Increase if you need larger values.
WBUF_SIZE Write buffer per connection 64 KB handles batched pipelined responses.
BACKLOG TCP accept queue depth 1024 handles burst connection storms.

All parameters except the flags are compile-time #defines in src/server.c (lines 57–70). Change and rebuild with make clean all.

About

High-performance in-memory key-value store in C with epoll/kqueue, sharded concurrency, LRU eviction, and TTL expiry

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors