Skip to content

Commit 346c007

Browse files
authored
Merge pull request #134 from CaptTofu/useful-info-from-misha
Added some useful info from Misha
2 parents 06b4062 + 9733902 commit 346c007

File tree

2 files changed

+48
-0
lines changed

2 files changed

+48
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
---
2+
title: "Notes on Various Errors with respect to replication and distributed connections"
3+
linkTtitle: "Notes on Various Errors with respect to replication and distributed connections"
4+
description: >
5+
Notes on errors related to replication and distributed connections
6+
keywords:
7+
- replication
8+
- distributed connections
9+
---
10+
# Notes on Various Errors with respect to replication and distributed connections
11+
12+
## `ClickHouseDistributedConnectionExceptions`
13+
14+
This alert usually indicates that one of the nodes isn’t responding or that there’s an interconnectivity issue. Debug steps:
15+
16+
## 1. Check Cluster Connectivity
17+
Verify connectivity inside the cluster by running:
18+
```
19+
SELECT count() FROM clusterAllReplicas('{cluster}', cluster('{cluster}', system.one))
20+
```
21+
22+
## 2. Check for Errors
23+
Run the following queries to see if any nodes report errors:
24+
25+
```
26+
SELECT hostName(), * FROM clusterAllReplicas('{cluster}', system.clusters) WHERE errors_count > 0;
27+
SELECT hostName(), * FROM clusterAllReplicas('{cluster}', system.errors) WHERE last_error_time > now() - 3600 ORDER BY value;
28+
```
29+
30+
Depending on the results, ensure that the affected node is up and responding to queries. Also, verify that connectivity (DNS, routes, delays) is functioning correctly.
31+
32+
### `ClickHouseReplicatedPartChecksFailed` & `ClickHouseReplicatedPartFailedFetches`
33+
34+
Unless you’re seeing huge numbers, these alerts can generally be ignored. They’re often a sign of temporary replication issues that ClickHouse resolves on its own. However, if the issue persists or increases rapidly, follow the steps to debug replication issues:
35+
36+
* Check the replication status using tables such as system.replicas and system.replication_queue.
37+
* Examine server logs, system.errors, and system load for any clues.
38+
* Try to restart the replica (`SYSTEM RESTART REPLICA db_name.table_name` command) and, if necessary, contact Altinity support.

content/en/altinity-kb-useful-queries/detached-parts.md

+10
Original file line numberDiff line numberDiff line change
@@ -73,3 +73,13 @@ covered-by-broken - that means that ClickHouse during initialization of replica
7373
```
7474

7575
The list of DETACH_REASONS: https://github.com/ClickHouse/ClickHouse/blob/master/src/Storages/MergeTree/MergeTreePartInfo.h#L163
76+
77+
## More notes on ClickHouseDetachedParts
78+
79+
Detached parts act like the “Recycle Bin” in Windows. When ClickHouse deems some data unneeded—often during internal reconciliations at server startup—it moves the data to the detached area instead of deleting it immediately.
80+
81+
Recovery: If you’re missing data due to misconfiguration or an error (such as connecting to the wrong ZooKeeper), check the detached parts. The missing data might be recoverable through manual intervention.
82+
83+
Cleanup: Otherwise, clean up the detached parts periodically to free disk space.
84+
85+
Regarding detached parts and the absence of an automatic cleanup feature within ClickHouse: this was a deliberate decision, as there is a possibility that data may appear there due to a bug in ClickHouse's code, a hardware error (such as a memory error or disk failure), etc. In such cases, automatic cleanup is not desirable.

0 commit comments

Comments
 (0)