You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: remove unverified timeout scenarios and NLB troubleshooting from Spark connector
Remove the Common timeout scenarios section and the Connection resets on
AWS PrivateLink or NLB troubleshooting entry. The NLB broken pipe scenario
could not be reproduced from EMR against the staging endpoint (public HTTPS
path, no NLB in the TCP path), so this content is not verified and should
not be published.
| spark.clickhouse.read.settings ||Comma-separated list of ClickHouse server session settings to apply on read, e.g. `max_execution_time=300,max_memory_usage=10000000000`. These are passed as HTTP query parameters on every read request. Must be set via `spark.conf.set(...)` at the session level — passing via DataFrame `.option()` has no effect.| 0.9.0 |
1493
+
| spark.clickhouse.read.settings ||ClickHouse settings appended as a `SETTINGS` clause to every generated read query, e.g. `final=1, max_execution_time=300`. The value is lowercased and inserted verbatim after `SETTINGS`, so it must be valid SQL. Must be set via `spark.conf.set(...)` at the session level — passing via DataFrame `.option()` has no effect. | 0.9.0 |
1494
1494
| spark.clickhouse.read.splitByPartitionId | true | If `true`, construct input partition filter by virtual column `_partition_id`, instead of partition value. There are known issues with assembling SQL predicates by partition value. This feature requires ClickHouse Server v21.6+ | 0.4.0 |
1495
1495
| spark.clickhouse.useNullableQuerySchema | false | If `true`, mark all the fields of the query schema as nullable when executing `CREATE/REPLACE TABLE ... AS SELECT ...` on creating the table. Note, this configuration requires SPARK-43390(available in Spark 3.5), w/o this patch, it always acts as `true`. | 0.8.0 |
1496
1496
| spark.clickhouse.write.batchSize | 10000 | The number of records per batch on writing to ClickHouse. | 0.1.0 |
@@ -1682,8 +1682,8 @@ With these settings, all reads and writes go through the single coordinator node
1682
1682
1683
1683
The connector uses ClickHouse's HTTP interface via the [ClickHouse Java client](https://github.com/ClickHouse/clickhouse-java). There are two independent ways to pass settings:
1684
1684
1685
-
-**`option.clickhouse_setting_<name>`** — the first-class mechanism for passing ClickHouse server session settings through the Java client. Applies to all operations for the lifetime of the catalog.
1686
-
-**`spark.clickhouse.read.settings`** — per-read override: a comma-separated list of `key=value` server settings applied only to read requests.
1685
+
-**`option.clickhouse_setting_<name>`** — the first-class mechanism for passing ClickHouse server settings through the Java client. Applied to every client operation (reads and writes) for the lifetime of the catalog.
1686
+
-**`option.http_header_<name>`** — adds a custom HTTP header to every request the Java client sends to ClickHouse. Useful for setting ClickHouse-recognised headers (e.g. `X-ClickHouse-Quota`) or for passing headers expected by a proxy or gateway in front of the server. Applied to every operation for the lifetime of the catalog.
To enable TLS, set the `protocol` direct catalog property instead of an `option.*` key:
1698
+
To add custom HTTP headers to every request, use `option.http_header_<name>`. Headers recognised by ClickHouse itself (e.g. `X-ClickHouse-Quota`) will be honoured by the server; any other header is only useful if a proxy or gateway in front of ClickHouse is configured to read it.
To enable TLS, set `option.ssl=true` and point `http_port` at the HTTPS endpoint:
1705
+
1706
+
```text
1707
+
spark.sql.catalog.clickhouse.option.ssl true
1708
+
spark.sql.catalog.clickhouse.http_port 8443
1703
1709
```
1704
1710
1705
1711
For the full list of available Java client option keys, see the [Java client configuration documentation](https://clickhouse.com/docs/integrations/language-clients/java/client#configuration). For available ClickHouse server session settings, see the [settings documentation](https://clickhouse.com/docs/operations/settings/settings).
1706
1712
1707
1713
### Via TableProvider API {#query-settings-tableprovider-api}
1708
1714
1709
-
Use `spark.clickhouse.read.settings` to apply ClickHouse server settings to reads. This setting is read from the Spark session config — passing it as a DataFrame `.option()` has no effect. It applies to all reads in the session until unset.
1715
+
Use `spark.clickhouse.read.settings` to apply ClickHouse settings to reads. The value is appended as a `SETTINGS` clause to every generated read query, so it must be valid SQL (e.g. `final=1, max_execution_time=300`). This setting is read from the Spark session config — passing it as a DataFrame `.option()` has no effect. It applies to all reads in the session until unset.
1710
1716
1711
1717
To scope the settings to a single read, set the config, force the action, then unset:
1712
1718
@@ -1847,7 +1853,7 @@ The connector enforces its own timeouts that are independent of any Spark or Cli
1847
1853
1848
1854
The 60-second query cap is enforced by the connector regardless of `max_execution_time` or any other server setting. If a read query takes longer than 60 seconds end-to-end, the connector will abort it. There is no `spark.clickhouse.*` setting to override this value.
1849
1855
1850
-
Inserts have no connector-level timeout. This means a stalled or very slow insert will hang until a network device terminates the connection — which can produce a **"Broken pipe"** error (see [Connection resets on AWS PrivateLink or NLB](#troubleshooting-connection-resets) below).
1856
+
Inserts have no connector-level timeout. This means a stalled or very slow insert will hang until a network device terminates the connection — which can produce a **"Broken pipe"** error.
@@ -1856,7 +1862,7 @@ These are passed through to the underlying ClickHouse Java client using the `opt
1856
1862
| Catalog property | Default | Unit | What it controls |
1857
1863
|---|---|---|---|
1858
1864
|`option.socket_timeout`|`0` (unlimited) | ms | Deadline for each TCP read/write operation. `0` means no deadline — the call blocks indefinitely if the server stops responding. |
1859
-
|`option.connection_timeout`| not set | ms | Time allowed to establish a new TCP connection to ClickHouse. |
1865
+
|`option.connection_timeout`| not set (Java client default) | ms | Time allowed to establish a new TCP connection to ClickHouse. |
1860
1866
|`option.connection_ttl`|`-1` (unlimited) | ms | Maximum lifetime of a pooled connection. After this time, the connection is retired and not reused. **Set this below the idle timeout of any network appliance between Spark and ClickHouse** (e.g., `300000` for AWS NLB). |
1861
1867
|`option.http_keep_alive_timeout`| server default | ms | Overrides the HTTP keep-alive timeout. Set to `0` to disable keep-alive on network paths with aggressive idle-connection cutoffs. |
1862
1868
@@ -1869,27 +1875,6 @@ These are ClickHouse query settings sent with each request. They instruct the Cl
1869
1875
|`option.clickhouse_setting_max_execution_time`|`0` (unlimited) | seconds | Server-side hard cap on query execution time. Useful for preventing runaway reads from consuming server resources, but **does not override the connector's 60-second query timeout**. |
1870
1876
|`option.clickhouse_setting_session_timeout`|`60`| seconds | HTTP session lifetime on the server. |
1871
1877
1872
-
### Common timeout scenarios {#timeout-scenarios}
1873
-
1874
-
**Long-running reads (>60 seconds)**
1875
-
1876
-
The connector's hard-coded 60-second query timeout will abort reads that take longer, even if the server could complete them. There is no configuration knob to extend or disable this cap. The only workaround is to reduce the amount of data returned by a single query so each read completes within 60 seconds:
1877
-
1878
-
- Add `WHERE` predicates to filter rows before they reach Spark.
1879
-
- Use partition pruning so the connector issues one query per partition rather than one large scan.
1880
-
1881
-
**Insert "Broken pipe" or "Connection reset" errors**
1882
-
1883
-
Typically caused by a network appliance (AWS NLB, PrivateLink, corporate proxy) silently dropping idle connections. The connector's connection pool holds connections open between operations; if the appliance's idle timeout (350 seconds for AWS NLB) expires, the connection is dropped at the network level without notification. The connector then tries to reuse a dead connection and gets a broken pipe.
Setting `connection_ttl` below the appliance's idle timeout forces the connector to retire connections proactively. See [Connection resets on AWS PrivateLink or NLB](#troubleshooting-connection-resets) for full details.
1892
-
1893
1878
## Performance tuning {#performance-tuning}
1894
1879
1895
1880
### Read performance {#read-performance}
@@ -1908,10 +1893,9 @@ Setting `connection_ttl` below the appliance's idle timeout forces the connector
|**Arrow write format**|`spark.clickhouse.write.format=arrow` (default) | Arrow is faster than JSON for most data types. Use `json`only for Variant/JSON column types. |
1896
+
|**Arrow write format**|`spark.clickhouse.write.format=arrow` (default) | Arrow is faster than JSON for most data types. On Spark 4.0 the Arrow writer handles `VariantType` columns via an internal JSON-string conversion; on Spark 3.x, fall back to `json`if your table has Variant or JSON columns. |
1912
1897
|**Compression**|`spark.clickhouse.write.compression.codec=lz4` (default) | Reduces network transfer during writes. |
1913
1898
|**Repartition by partition**|`spark.clickhouse.write.repartitionByPartition=true` (default) | Groups rows by partition before writing, reducing the number of parts created. |
1914
-
|**Explicit partition count**|`spark.clickhouse.write.repartitionNum=N`| Forces Spark to repartition to exactly N partitions before writing (via `requiredNumPartitions()`). Default `0` means no requirement. Alternatively, use `df.repartition(N)` before the write call. |
1915
1899
1916
1900
### Recommended starting configuration for bulk loads {#bulk-load-config}
If you also have `spark.clickhouse.write.distributed.convertLocal=true`, ignoring unsupported sharding keys can cause incorrect data distribution. In that case, either use a supported sharding key or set `spark.clickhouse.write.distributed.convertLocal.allowUnsupportedSharding=true` only if you have verified your data distribution — rows may be written to the wrong shard, causing data skew or incorrect query results when querying shards directly.
1953
1937
:::
1954
1938
1955
-
---
1956
-
1957
-
### Table not found or stale schema after DDL {#troubleshooting-stale-schema}
1958
-
1959
-
**Symptom**: After creating or altering a ClickHouse table, Spark still sees the old schema or raises a "table not found" error.
1960
-
1961
-
**Cause**: Spark caches catalog metadata. For ClickHouse Cloud, replication to all nodes may also take a moment.
### Connection resets on AWS PrivateLink or NLB {#troubleshooting-connection-resets}
1971
-
1972
-
**Symptom**: Reads or writes intermittently fail with `Broken pipe`, `Connection reset by peer`, or `java.io.IOException: Connection closed` — particularly on longer-running jobs or after a period of inactivity.
1973
-
1974
-
**Cause**: AWS Network Load Balancers (NLB) and PrivateLink silently drop TCP connections that have been idle for **350 seconds**. The connector's connection pool keeps HTTP connections alive between operations. When the NLB drops a pooled connection, neither the connector nor ClickHouse is notified. The next operation that picks up that connection fails immediately.
1975
-
1976
-
Inserts are especially vulnerable: there is no connector-level insert timeout, so a slow or large insert can stall mid-flight when the underlying connection is cut.
1977
-
1978
-
**Fix**: Retire pooled connections before the NLB does by setting `connection_ttl` below 350 seconds. Also set `socket_timeout` to bound individual operations and disable HTTP keep-alive to prevent stale connections from persisting in the pool:
For very large inserts that are genuinely expected to take several minutes, switch to async inserts so the server acknowledges receipt quickly and the connector is not waiting on a long-lived open connection:
### KEEPER_EXCEPTION during writes {#troubleshooting-keeper-exception}
1996
-
1997
-
**Symptom**: Writes fail with `Coordination::Exception: Can't get data for node ... node doesn't exist (KEEPER_EXCEPTION)`. Retries may produce duplicate data.
1998
-
1999
-
**Cause**: Too many concurrent Spark tasks overwhelm the ClickHouse replica's coordination layer (ClickHouse Keeper / ZooKeeper). This is common when the replica has limited CPU or memory relative to the write concurrency.
2000
-
2001
-
**Fix**:
2002
-
- Reduce the number of Spark write tasks by repartitioning the DataFrame before writing:
0 commit comments