Skip to content

Commit 4f053a2

Browse files
committed
docs(spark): document configurable client query timeout
Reflect spark-clickhouse-connector PR #542, which makes the previously hard-coded 60s ClickHouse client query/ping timeout configurable via spark.clickhouse.client.queryTimeout (since 0.10.1). - Add row to the Configurations table - Rewrite "Connector-internal timeouts" to drop the "hard-coded" claim - Cross-reference the new setting from the max_execution_time entries
1 parent f1fdf13 commit 4f053a2

1 file changed

Lines changed: 10 additions & 7 deletions

File tree

docs/integrations/data-ingestion/apache-spark/spark-native-connector.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1484,6 +1484,7 @@ Alternatively, set them in `spark-defaults.conf` or when creating the Spark sess
14841484

14851485
| Key | Default | Description | Since |
14861486
|----------------------------------------------------|--------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|
1487+
| spark.clickhouse.client.queryTimeout | 60s | The maximum time the ClickHouse client will wait for a single query or ping operation to complete on a `NodeClient`. Applied as a future-handle timeout on every `client.query(...)` and `client.ping(...)` call. Used on both read and write paths (the write path issues metadata queries before each batch). Inserts themselves are unaffected — they have no connector-level timeout. | 0.10.1 |
14871488
| spark.clickhouse.ignoreUnsupportedTransform | true | ClickHouse supports using complex expressions as sharding keys or partition values, e.g. `cityHash64(col_1, col_2)`, and those can not be supported by Spark now. If `true`, ignore the unsupported expressions and log a warning, otherwise fail fast w/ an exception. **Warning**: When `spark.clickhouse.write.distributed.convertLocal=true`, ignoring unsupported sharding keys may corrupt the data. The connector validates this and throws an error by default. To allow it, explicitly set `spark.clickhouse.write.distributed.convertLocal.allowUnsupportedSharding=true`. | 0.4.0 |
14881489
| spark.clickhouse.read.compression.codec | lz4 | The codec used to decompress data for reading. Supported codecs: none, lz4. | 0.5.0 |
14891490
| spark.clickhouse.read.distributed.convertLocal | true | When reading Distributed table, read local table instead of itself. If `true`, ignore `spark.clickhouse.read.distributed.useClusterNodes`. | 0.1.0 |
@@ -1830,7 +1831,7 @@ df.write()
18301831
| `option.clickhouse_setting_wait_for_async_insert` | `1` | Wait for async insert acknowledgement before returning |
18311832
| `option.clickhouse_setting_insert_deduplicate` | `0` | Disable deduplication for idempotent write pipelines |
18321833
| `option.clickhouse_setting_max_insert_block_size` | `1048576` | Control max block size for inserts |
1833-
| `option.clickhouse_setting_max_execution_time` | `300` | Extend query timeout (seconds) for large reads |
1834+
| `option.clickhouse_setting_max_execution_time` | `300` | Extend server-side query timeout (seconds) for large reads. To actually allow reads longer than the connector default, also raise `spark.clickhouse.client.queryTimeout` (default `60s`). |
18341835
| `option.clickhouse_setting_session_timeout` | `60` | Extend HTTP session timeout (seconds) |
18351836

18361837
:::note
@@ -1843,14 +1844,16 @@ Timeouts in the Spark connector operate at three independent layers. Misdiagnosi
18431844

18441845
### Connector-internal timeouts {#timeout-connector}
18451846

1846-
The connector enforces its own timeouts that are independent of any Spark or ClickHouse setting:
1847+
The connector enforces its own timeouts on each Java client `query` and `ping` operation, independent of any ClickHouse server setting:
18471848

1848-
| Behavior | Value | Configurable |
1849+
| Behavior | Default | Configurable via |
18491850
|---|---|---|
1850-
| Query timeout | **60 seconds** | No — hard-coded in the connector |
1851-
| Insert timeout | **None** | No — inserts run until the network drops the connection |
1851+
| Query / ping timeout | **60 seconds** | `spark.clickhouse.client.queryTimeout` (since 0.10.1) |
1852+
| Insert timeout | **None** | Not configurable — inserts run until the network drops the connection |
18521853

1853-
The 60-second query cap is enforced by the connector regardless of `max_execution_time` or any other server setting. If a read query takes longer than 60 seconds end-to-end, the connector will abort it. There is no `spark.clickhouse.*` setting to override this value.
1854+
The query timeout applies to every `client.query(...)` and `client.ping(...)` call made by the connector — this covers all reads, all DDL/metadata operations, and the metadata queries issued by the write path before each batch. If a query takes longer than the configured value end-to-end, the connector aborts it, regardless of `max_execution_time` or any other server setting.
1855+
1856+
Before 0.10.1 this value was hard-coded at 60 seconds. From 0.10.1 onwards, override it via the Spark session config (e.g. `spark.conf.set("spark.clickhouse.client.queryTimeout", "300s")`). The value uses Spark's `TimeUnit` syntax (`ms`, `s`, `m`, …) and must be positive.
18541857

18551858
Inserts have no connector-level timeout. This means a stalled or very slow insert will hang until a network device terminates the connection — which can produce a **"Broken pipe"** error.
18561859

@@ -1871,7 +1874,7 @@ These are ClickHouse query settings sent with each request. They instruct the Cl
18711874

18721875
| Catalog property | Default | Unit | What it controls |
18731876
|---|---|---|---|
1874-
| `option.clickhouse_setting_max_execution_time` | `0` (unlimited) | seconds | Server-side hard cap on query execution time. Useful for preventing runaway reads from consuming server resources, but **does not override the connector's 60-second query timeout**. |
1877+
| `option.clickhouse_setting_max_execution_time` | `0` (unlimited) | seconds | Server-side hard cap on query execution time. Useful for preventing runaway reads from consuming server resources. **Does not override the connector's client query timeout** — if you raise this for long reads, also raise `spark.clickhouse.client.queryTimeout` (default `60s`). |
18751878
| `option.clickhouse_setting_session_timeout` | `60` | seconds | HTTP session lifetime on the server. |
18761879

18771880
## Performance tuning {#performance-tuning}

0 commit comments

Comments
 (0)