Create a key-value store that runs across a cluster. The data is replicated using Raft to achieve data replication in a strongly consistent fashion as long as there is quorum established.
Suppose there are
As long as the quorum stays at this level, the cluster can be consider available and fault-tolerant.
Nodes can be removed and added back to the cluster and would be eventually consistent thanks to the snapshotting and restore mechanism of Raft protocol.
This repo uses .env* files to initialize the cluster nodes.
So to run a cluster, the following steps need to be done:
- Install Go v1.23
- Run
make run-testto verify if your repo works locally. - Copy and edit the 3 env files (
.env1,.env2and.env3) to create 3 separate initialization configurations - Optional: You can create more
.envfiles if you want to add more nodes to the cluster and edit theMakefileto add more rules. The currentMakefileonly has rules for 3 nodes. - Run
make run1to kickstart the first node. This automatically becomes theleadernode. - Run
make run2to add the second node. Any nodes after the leader would join as a follower and the FSM state would be replicated to the follower. - Run
make run3 - Call
POST leaderaddr/keywith body to store kv pair - Call
GET nodeaddr/key/{key}to fetch the pair from any node in the cluster
This section explains the configs found in the env files
# server configs
SERVER_ADDRESS=localhost:8889 ---> This is the address used to make the GET/POST/DEL with keys
# service kv configs
SERVICE_KEY_MAX_LEN=100 ---> We limit the size of the keys using this config. If larger, we get a 400
SERVICE_VAL_MAX_LEN=200 ---> We limit the size of the vals the same way
SERVICE_MAX_MAP_SIZE=1000 --> This is how we keep track of the upper limit of the size of the map. We get a 400 if this is exceeded as well
# service raft configs
SERVICE_RAFT_LEADER=false -------------------> this is used to indicate if the node (at setup time) is a leader or follower
SERVICE_RAFT_STORE_DIR="./node2" ------------> log store location for raft
SERVICE_RAFT_ADDR=localhost:21002 -----------> raft addr
SERVICE_RAFT_NODE_ID=node2-------------------> raft node id
SERVICE_RAFT_JOIN_ADDR=localhost:8888--------> if this is a follower node, we need to register with the leader and this addr is used
PS: Service also has a `debug` config which is used in tests to run without raft. Feel free to kill any node in the cluster and as long as the quorum condition is met, the cluster should still be available.
Killing off the leader would trigger a leader-election and any new POSTs would need to be done on the new leader. Request forwarding to the new leader is not covered in this example.
Adding a node back into the mix, should trigger raft to kick in and re-populate the internal store.
Without Raft cluster setup and on 100k map size, we get:
goos: darwin
goarch: amd64
pkg: github.com/tomkaith13/dist-kv-store/internal/service
cpu: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
BenchmarkNoRaftWithOneNodesSetAndGet-8 1 4510991898 ns/op 728895336 B/op 9701978 allocs/op
PASSWith Raft cluster setup: Using k6
- Install k6
- kick start all nodes in this order:
make run1,make run2and finallymake run3
- run
k6 run perf.js
The results are:
➜ dist-key-val git:(main) ✗ k6 run perf.js
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: perf.js
output: -
scenarios: (100.00%) 1 scenario, 1 max VUs, 40s max duration (incl. graceful stop):
* default: 1 looping VUs for 10s (gracefulStop: 30s)
✓ post status was 201
✓ get status was 200
checks.........................: 100.00% ✓ 20 ✗ 0
data_received..................: 2.8 kB 276 B/s
data_sent......................: 2.8 kB 275 B/s
dkv_get_key....................: avg=538.69µs min=396µs med=518µs max=868µs p(90)=688µs p(95)=777.99µs
dkv_set_key....................: avg=1.27ms min=664µs med=782.5µs max=4.22ms p(90)=2.48ms p(95)=3.35ms
http_req_blocked...............: avg=124.4µs min=3µs med=5µs max=2.03ms p(90)=46.6µs p(95)=450.2µs
http_req_connecting............: avg=46.8µs min=0s med=0s max=683µs p(90)=25.3µs p(95)=274.5µs
http_req_duration..............: avg=908.45µs min=396µs med=686µs max=4.22ms p(90)=1.15ms p(95)=2.38ms
{ expected_response:true }...: avg=908.45µs min=396µs med=686µs max=4.22ms p(90)=1.15ms p(95)=2.38ms
http_req_failed................: 0.00% ✓ 0 ✗ 20
http_req_receiving.............: avg=77.09µs min=41µs med=55µs max=204µs p(90)=150.3µs p(95)=172.65µs
http_req_sending...............: avg=199.9µs min=17µs med=32µs max=3.08ms p(90)=127µs p(95)=368.95µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=631.44µs min=330µs med=601.49µs max=1.86ms p(90)=909.7µs p(95)=1.01ms
http_reqs......................: 20 1.992157/s
iteration_duration.............: avg=1s min=1s med=1s max=1s p(90)=1s p(95)=1s
iterations.....................: 10 0.996079/s
vus............................: 1 min=1 max=1
vus_max........................: 1 min=1 max=1
running (10.0s), 0/1 VUs, 10 complete and 0 interrupted iterations
default ✓ [======================================] 1 VUs 10sThe results can be found in the trends with names dkv_get_key and dkv_set_key.
In a 3 body cluster,
p95 of GETs are 777us
p95 of SETs are 3.35ms
- As an improvement we could add support for leader forwarding. Say an old leader from a prev term gets a SET /key request. A better UX would be to forward that request over to the new leader. Currently that functionality is absent. -limit the key and val size to ensure the snapshotting process is quick and same goes with restore.
- Figure out how to setup the cluster and test without using tools like k6
- right now, I keep getting connection refused when i try to add a second node in tests.