Merge pull request #134 from CaptTofu/useful-info-from-misha

DougTidwell · web-flow · commit 346c007a4e24 · 2025-05-12T07:33:23.000-04:00
Added some useful info from Misha
diff --git a/content/en/altinity-kb-useful-queries/connection-issues-distributed-parts.md b/content/en/altinity-kb-useful-queries/connection-issues-distributed-parts.md
@@ -0,0 +1,38 @@
+---
+title: "Notes on Various Errors with respect to replication and distributed connections"
+linkTtitle: "Notes on Various Errors with respect to replication and distributed connections"
+description: >
+    Notes on errors related to replication and distributed connections
+keywords: 
+  - replication 
+  - distributed connections
+---
+# Notes on Various Errors with respect to replication and distributed connections
+
+## `ClickHouseDistributedConnectionExceptions`
+
+This alert usually indicates that one of the nodes isn’t responding or that there’s an interconnectivity issue. Debug steps:
+
+## 1. Check Cluster Connectivity
+Verify connectivity inside the cluster by running: 
+```
+SELECT count() FROM clusterAllReplicas('{cluster}', cluster('{cluster}', system.one))
+```
+
+## 2. Check for Errors
+Run the following queries to see if any nodes report errors: 
+
+```
+SELECT hostName(), * FROM clusterAllReplicas('{cluster}', system.clusters) WHERE errors_count > 0;
+SELECT hostName(), * FROM clusterAllReplicas('{cluster}', system.errors) WHERE last_error_time > now() - 3600 ORDER BY value;
+```
+
+ Depending on the results, ensure that the affected node is up and responding to queries. Also, verify that connectivity (DNS, routes, delays) is functioning correctly.
+
+### `ClickHouseReplicatedPartChecksFailed` & `ClickHouseReplicatedPartFailedFetches`
+
+Unless you’re seeing huge numbers, these alerts can generally be ignored. They’re often a sign of temporary replication issues that ClickHouse resolves on its own. However, if the issue persists or increases rapidly, follow the steps to debug replication issues:
+
+* Check the replication status using tables such as system.replicas and system.replication_queue.
+* Examine server logs, system.errors, and system load for any clues.
+* Try to restart the replica  (`SYSTEM RESTART REPLICA db_name.table_name` command) and, if necessary, contact Altinity support.
diff --git a/content/en/altinity-kb-useful-queries/detached-parts.md b/content/en/altinity-kb-useful-queries/detached-parts.md
@@ -73,3 +73,13 @@ covered-by-broken  - that means that ClickHouse during initialization of replica
 ```
 
 The list of DETACH_REASONS: https://github.com/ClickHouse/ClickHouse/blob/master/src/Storages/MergeTree/MergeTreePartInfo.h#L163
+
+## More notes on ClickHouseDetachedParts
+
+Detached parts act like the “Recycle Bin” in Windows. When ClickHouse deems some data unneeded—often during internal reconciliations at server startup—it moves the data to the detached area instead of deleting it immediately.
+
+Recovery: If you’re missing data due to misconfiguration or an error (such as connecting to the wrong ZooKeeper), check the detached parts. The missing data might be recoverable through manual intervention.
+
+Cleanup: Otherwise, clean up the detached parts periodically to free disk space.
+
+Regarding detached parts and the absence of an automatic cleanup feature within ClickHouse: this was a deliberate decision, as there is a possibility that data may appear there due to a bug in ClickHouse's code, a hardware error (such as a memory error or disk failure), etc. In such cases, automatic cleanup is not desirable.