Skip to content

Commit ddab6c3

Browse files
committed
update
1 parent d66f556 commit ddab6c3

File tree

1 file changed

+44
-0
lines changed

1 file changed

+44
-0
lines changed

clickhouse/replica_sharding.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Replication and Sharding
2+
3+
Source:
4+
5+
- <https://altinity.com/wp-content/uploads/2024/05/Deep-Dive-on-ClickHouse-Sharding-and-Replication-2024-1-1.pdf>
6+
- <https://chistadata.com/how-to-setup-6-nodes-clickhouse-horizontal-scaling/>
7+
8+
## Sharding
9+
10+
Sharding is splitting a large table horizontally (row-wise) and storing it in multiple servers. ClickHouse uses distributed table engine for processing the sharded tables. Shards can be internally replicated or non-replicated in ClickHouse. Sharding allows storing huge amounts of data that may otherwise not fit in a single server.
11+
12+
Pros:
13+
14+
- High availability
15+
- Faster query response time
16+
- Increased write bandwidth
17+
- Easy to scale out
18+
19+
Cons:
20+
21+
- Added complexity
22+
- Possibility of unbalanced data in a shard
23+
24+
ClickHouse uses hash-based sharding, where a column is chosen as the sharding key from the table, and the values are hashed. Storing the data in appropriate shards is based on the hash value. For example, if there are 3 shards for a table, the remainder (modulo operation) of the hashed value, when divided by the number of shards, will determine the shard on which the particular row is stored (e.g 1234 % 3 = 1, so the data is stored on shard 2 and if the hash value is 12345 the data will be stored on shard 1 (12345 % 3 = 0))
25+
26+
Distributed table engines can perform parallel and distributed query processing in a ClickHouse cluster. This table engine can not store the data independently and depends on other table engines (MergeTree family) to store the underlying data. It is possible to insert the data directly into the distributed table (and ClickHouse determines the shards based on the shard key) or insert it into the underlying storage table in every cluster manually. It is possible to read the data directly by querying the distributed engine table.
27+
28+
```
29+
- Replias improve read QPS and concurrency.
30+
- Shards add throughput and IOPS.
31+
```
32+
33+
## Parallel replicas with dynamic sharding
34+
35+
Source:
36+
37+
- <https://clickhouse.com/docs/deployment-guides/parallel-replicas>
38+
- <https://chistadata.com/parallel-replicas-with-dynamic-shards-horizontal-scaling-clickhouse/>
39+
40+
> It is released with 23.3 version of ClickHouse.
41+
42+
Imagine that you want to build a ClickHouse cluster with 10 replicas. Normally, when you run a Select query, your database server send a request your primary server and bring the result to you.
43+
44+
With paralel replicas, when you run a Select query for example, Your request is automatically distributed among all replicas and data is read from all replicas so that your transaction result can reach you faster. It is like your data is reading from sharded cluster but it is actually just replicated cluster.

0 commit comments

Comments
 (0)