You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reflect spark-clickhouse-connector PR #542, which makes the previously
hard-coded 60s ClickHouse client query/ping timeout configurable via
spark.clickhouse.client.queryTimeout (since 0.10.1).
- Add row to the Configurations table
- Rewrite "Connector-internal timeouts" to drop the "hard-coded" claim
- Cross-reference the new setting from the max_execution_time entries
| spark.clickhouse.client.queryTimeout | 60s | The maximum time the ClickHouse client will wait for a single query or ping operation to complete on a `NodeClient`. Applied as a future-handle timeout on every `client.query(...)` and `client.ping(...)` call. Used on both read and write paths (the write path issues metadata queries before each batch). Inserts themselves are unaffected — they have no connector-level timeout. | 0.10.1 |
1487
1488
| spark.clickhouse.ignoreUnsupportedTransform | true | ClickHouse supports using complex expressions as sharding keys or partition values, e.g. `cityHash64(col_1, col_2)`, and those can not be supported by Spark now. If `true`, ignore the unsupported expressions and log a warning, otherwise fail fast w/ an exception. **Warning**: When `spark.clickhouse.write.distributed.convertLocal=true`, ignoring unsupported sharding keys may corrupt the data. The connector validates this and throws an error by default. To allow it, explicitly set `spark.clickhouse.write.distributed.convertLocal.allowUnsupportedSharding=true`. | 0.4.0 |
1488
1489
| spark.clickhouse.read.compression.codec | lz4 | The codec used to decompress data for reading. Supported codecs: none, lz4. | 0.5.0 |
1489
1490
| spark.clickhouse.read.distributed.convertLocal | true | When reading Distributed table, read local table instead of itself. If `true`, ignore `spark.clickhouse.read.distributed.useClusterNodes`. | 0.1.0 |
@@ -1830,7 +1831,7 @@ df.write()
1830
1831
|`option.clickhouse_setting_wait_for_async_insert`|`1`| Wait for async insert acknowledgement before returning |
1831
1832
|`option.clickhouse_setting_insert_deduplicate`|`0`| Disable deduplication for idempotent write pipelines |
1832
1833
|`option.clickhouse_setting_max_insert_block_size`|`1048576`| Control max block size for inserts |
1833
-
|`option.clickhouse_setting_max_execution_time`|`300`| Extend query timeout (seconds) for large reads |
1834
+
|`option.clickhouse_setting_max_execution_time`|`300`| Extend server-side query timeout (seconds) for large reads. To actually allow reads longer than the connector default, also raise `spark.clickhouse.client.queryTimeout` (default `60s`).|
| Insert timeout |**None**|Not configurable — inserts run until the network drops the connection |
1852
1853
1853
-
The 60-second query cap is enforced by the connector regardless of `max_execution_time` or any other server setting. If a read query takes longer than 60 seconds end-to-end, the connector will abort it. There is no `spark.clickhouse.*` setting to override this value.
1854
+
The query timeout applies to every `client.query(...)` and `client.ping(...)` call made by the connector — this covers all reads, all DDL/metadata operations, and the metadata queries issued by the write path before each batch. If a query takes longer than the configured value end-to-end, the connector aborts it, regardless of `max_execution_time` or any other server setting.
1855
+
1856
+
Before 0.10.1 this value was hard-coded at 60 seconds. From 0.10.1 onwards, override it via the Spark session config (e.g. `spark.conf.set("spark.clickhouse.client.queryTimeout", "300s")`). The value uses Spark's `TimeUnit` syntax (`ms`, `s`, `m`, …) and must be positive.
1854
1857
1855
1858
Inserts have no connector-level timeout. This means a stalled or very slow insert will hang until a network device terminates the connection — which can produce a **"Broken pipe"** error.
1856
1859
@@ -1871,7 +1874,7 @@ These are ClickHouse query settings sent with each request. They instruct the Cl
1871
1874
1872
1875
| Catalog property | Default | Unit | What it controls |
1873
1876
|---|---|---|---|
1874
-
|`option.clickhouse_setting_max_execution_time`|`0` (unlimited) | seconds | Server-side hard cap on query execution time. Useful for preventing runaway reads from consuming server resources, but **does not override the connector's 60-second query timeout**. |
1877
+
|`option.clickhouse_setting_max_execution_time`|`0` (unlimited) | seconds | Server-side hard cap on query execution time. Useful for preventing runaway reads from consuming server resources. **Does not override the connector's client query timeout** — if you raise this for long reads, also raise `spark.clickhouse.client.queryTimeout` (default `60s`). |
1875
1878
|`option.clickhouse_setting_session_timeout`|`60`| seconds | HTTP session lifetime on the server. |
0 commit comments