In-Memory Key-Value Store

A high-performance, persistent, and observable in-memory key-value store built with Go. This project is a deep dive into the internal mechanisms of modern databases, demonstrating a practical exploration of concepts like data persistence, concurrency control, and system observability from the ground up.

Features

Dual HTTP & gRPC APIs: Interact via a simple RESTful interface or a high-performance gRPC API for PUT, GET, and DELETE operations.
Concurrent & Performant: Utilizes a sharded map to minimize lock contention, allowing it to handle high-throughput workloads efficiently.
Durable Persistence: Implements a Write-Ahead Log (WAL) using Protocol Buffers to ensure that no data is lost in a compact and efficient binary format.
Fast Recovery: Periodically creates snapshots of the data to compact the WAL, ensuring quick restarts. Snapshots also use Protocol Buffers for efficient storage.
Built-in Observability: Comes with a pre-configured Grafana dashboard for monitoring key performance metrics via Prometheus.
Automatic Memory Reclamation: Periodically rebuilds storage shards to reclaim memory from deleted items, preventing memory bloat in write-heavy workloads.

Architecture and Design Decisions

This project was built with a focus on exploring the core principles behind modern data systems. The following are key architectural decisions made to address common challenges in database design:

High-Throughput Concurrency

Problem: A single, global lock on a central data map creates a bottleneck under concurrent loads.
Solution: A sharded map partitions the key space across many maps, each with its own lock. This distributes write contention, significantly improving parallelism and throughput.

Crash Safety and Durability

Problem: In-memory data is lost on server crash or restart.
Solution: A Write-Ahead Log (WAL) persists all write operations to disk before they are applied in memory. This ensures that the store can be fully recovered by replaying the log after a crash.

Fast Startup and Recovery

Problem: Replaying a large WAL file on startup can lead to slow recovery times.
Solution: The system periodically creates snapshots of the in-memory state. On restart, the service loads the latest snapshot and replays only the WAL entries created since, dramatically reducing startup time.

Configuration

Before running the application, you need to create a config.yml file. A commented example is provided in config/config.example.yml. You can copy and modify it to get started:

cp config/config.example.yml config/config.yml

Getting Started

You can run the application using either Docker Compose or native Go commands.

With Docker (Recommended)

The easiest way to get started is with Docker Compose, which runs the KV store, Prometheus, and Grafana.

# Build and start all services in detached mode
docker-compose up -d --build

The KV store will be available at http://localhost:16700.
Prometheus will be available at http://localhost:9090.
Grafana will be available at http://localhost:3000.

With Go

You must have Go installed on your system.

# Build the binary
make build

# Run the application
# (Requires a config file, one is provided in the `config` directory)
make run

# Run all unit tests
make test

# Run unit tests with coverage
make test-cover

# Run linters
make lint

Observability

The project includes a pre-configured Grafana dashboard for visualizing performance metrics:

HTTP Performance: Request rates, latency distributions (heatmap), and latencypercentiles (P50, P90, P99).
Key-Value Operations: P99 latency for PUT, GET, and DELETE operations.
Error Rates: HTTP 4xx and 5xx error rates.

Performance

The store was benchmarked using k6 with an open model (arrival-rate) test to determine its limits under heavy, real-world-style load.

Metric	Result
Achieved RPS	~40,000 req/s
Target RPS	50,000 req/s
Failure Rate	0.00%
p95 Latency (PUT)	15.55ms
p95 Latency (DELETE)	7.38ms

Tradeoffs and Future Work

Performance Tuning: Profile the application under load to investigate the causes of the current performance ceiling and the observed "long-tail" latency (the significant gap between p95 and max response times) to further improve performance consistency.
Replication: To improve availability, a replication mechanism could be added to synchronize data to one or more follower nodes.
Clustering: For horizontal scalability, the store could be extended into a distributed system where data is sharded across multiple nodes in a cluster.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
.github/workflows		.github/workflows
api/openapi		api/openapi
cmd/app		cmd/app
config		config
grafana		grafana
internal		internal
pkg/logger		pkg/logger
prometheus		prometheus
proto		proto
.gitignore		.gitignore
.mockery.yml		.mockery.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

In-Memory Key-Value Store

Features

Architecture and Design Decisions

High-Throughput Concurrency

Crash Safety and Durability

Fast Startup and Recovery

Configuration

Getting Started

With Docker (Recommended)

With Go

Observability

Performance

Tradeoffs and Future Work

License

About

Uh oh!

Releases

Packages

Languages

License

shrtyk/kv-store

Folders and files

Latest commit

History

Repository files navigation

In-Memory Key-Value Store

Features

Architecture and Design Decisions

High-Throughput Concurrency

Crash Safety and Durability

Fast Startup and Recovery

Configuration

Getting Started

With Docker (Recommended)

With Go

Observability

Performance

Tradeoffs and Future Work

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages