docs: remove unverified timeout scenarios and NLB troubleshooting from Spark connector

ShimonSte · ShimonSte · commit 8d6bfbc1528c · 2026-04-21T13:30:52.000+03:00
Remove the Common timeout scenarios section and the Connection resets on
AWS PrivateLink or NLB troubleshooting entry. The NLB broken pipe scenario
could not be reproduced from EMR against the staging endpoint (public HTTPS
path, no NLB in the TCP path), so this content is not verified and should
not be published.
diff --git a/docs/integrations/data-ingestion/apache-spark/spark-native-connector.md b/docs/integrations/data-ingestion/apache-spark/spark-native-connector.md
@@ -1490,7 +1490,7 @@ Alternatively, set them in `spark-defaults.conf` or when creating the Spark sess
 | spark.clickhouse.read.fixedStringAs                | binary                                                 | Read ClickHouse FixedString type as the specified Spark data type. Supported types: binary, string                                                                                                                                                                                                                                                                                                              | 0.8.0 |
 | spark.clickhouse.read.format                       | json                                                   | Serialize format for reading. Supported formats: json, binary                                                                                                                                                                                                                                                                                                                                                   | 0.6.0 |
 | spark.clickhouse.read.runtimeFilter.enabled        | false                                                  | Enable runtime filter for reading.                                                                                                                                                                                                                                                                                                                                                                              | 0.8.0 |
-| spark.clickhouse.read.settings                     |                                                        | Comma-separated list of ClickHouse server session settings to apply on read, e.g. `max_execution_time=300,max_memory_usage=10000000000`. These are passed as HTTP query parameters on every read request. Must be set via `spark.conf.set(...)` at the session level — passing via DataFrame `.option()` has no effect.                                                                                                                                                    | 0.9.0 |
+| spark.clickhouse.read.settings                     |                                                        | ClickHouse settings appended as a `SETTINGS` clause to every generated read query, e.g. `final=1, max_execution_time=300`. The value is lowercased and inserted verbatim after `SETTINGS`, so it must be valid SQL. Must be set via `spark.conf.set(...)` at the session level — passing via DataFrame `.option()` has no effect.                                                                                                                                          | 0.9.0 |
 | spark.clickhouse.read.splitByPartitionId           | true                                                   | If `true`, construct input partition filter by virtual column `_partition_id`, instead of partition value. There are known issues with assembling SQL predicates by partition value. This feature requires ClickHouse Server v21.6+                                                                                                                                                                             | 0.4.0 |
 | spark.clickhouse.useNullableQuerySchema            | false                                                  | If `true`, mark all the fields of the query schema as nullable when executing `CREATE/REPLACE TABLE ... AS SELECT ...` on creating the table. Note, this configuration requires SPARK-43390(available in Spark 3.5), w/o this patch, it always acts as `true`.                                                                                                                                                  | 0.8.0 |
 | spark.clickhouse.write.batchSize                   | 10000                                                  | The number of records per batch on writing to ClickHouse.                                                                                                                                                                                                                                                                                                                                                       | 0.1.0 |
@@ -1682,8 +1682,8 @@ With these settings, all reads and writes go through the single coordinator node
 
 The connector uses ClickHouse's HTTP interface via the [ClickHouse Java client](https://github.com/ClickHouse/clickhouse-java). There are two independent ways to pass settings:
 
-- **`option.clickhouse_setting_<name>`** — the first-class mechanism for passing ClickHouse server session settings through the Java client. Applies to all operations for the lifetime of the catalog.
-- **`spark.clickhouse.read.settings`** — per-read override: a comma-separated list of `key=value` server settings applied only to read requests.
+- **`option.clickhouse_setting_<name>`** — the first-class mechanism for passing ClickHouse server settings through the Java client. Applied to every client operation (reads and writes) for the lifetime of the catalog.
+- **`option.http_header_<name>`** — adds a custom HTTP header to every request the Java client sends to ClickHouse. Useful for setting ClickHouse-recognised headers (e.g. `X-ClickHouse-Quota`) or for passing headers expected by a proxy or gateway in front of the server. Applied to every operation for the lifetime of the catalog.
 
 ### Via Catalog API {#query-settings-catalog-api}
 
@@ -1695,18 +1695,24 @@ spark.sql.catalog.clickhouse.option.clickhouse_setting_wait_for_async_insert   1
 spark.sql.catalog.clickhouse.option.clickhouse_setting_max_execution_time      300
 ```
 
-To enable TLS, set the `protocol` direct catalog property instead of an `option.*` key:
+To add custom HTTP headers to every request, use `option.http_header_<name>`. Headers recognised by ClickHouse itself (e.g. `X-ClickHouse-Quota`) will be honoured by the server; any other header is only useful if a proxy or gateway in front of ClickHouse is configured to read it.
 
 ```text
-spark.sql.catalog.clickhouse.protocol   https
-spark.sql.catalog.clickhouse.http_port  8443
+spark.sql.catalog.clickhouse.option.http_header_x_clickhouse_quota   spark_etl_quota
+```
+
+To enable TLS, set `option.ssl=true` and point `http_port` at the HTTPS endpoint:
+
+```text
+spark.sql.catalog.clickhouse.option.ssl   true
+spark.sql.catalog.clickhouse.http_port    8443
 ```
 
 For the full list of available Java client option keys, see the [Java client configuration documentation](https://clickhouse.com/docs/integrations/language-clients/java/client#configuration). For available ClickHouse server session settings, see the [settings documentation](https://clickhouse.com/docs/operations/settings/settings).
 
 ### Via TableProvider API {#query-settings-tableprovider-api}
 
-Use `spark.clickhouse.read.settings` to apply ClickHouse server settings to reads. This setting is read from the Spark session config — passing it as a DataFrame `.option()` has no effect. It applies to all reads in the session until unset.
+Use `spark.clickhouse.read.settings` to apply ClickHouse settings to reads. The value is appended as a `SETTINGS` clause to every generated read query, so it must be valid SQL (e.g. `final=1, max_execution_time=300`). This setting is read from the Spark session config — passing it as a DataFrame `.option()` has no effect. It applies to all reads in the session until unset.
 
 To scope the settings to a single read, set the config, force the action, then unset:
 
@@ -1847,7 +1853,7 @@ The connector enforces its own timeouts that are independent of any Spark or Cli
 
 The 60-second query cap is enforced by the connector regardless of `max_execution_time` or any other server setting. If a read query takes longer than 60 seconds end-to-end, the connector will abort it. There is no `spark.clickhouse.*` setting to override this value.
 
-Inserts have no connector-level timeout. This means a stalled or very slow insert will hang until a network device terminates the connection — which can produce a **"Broken pipe"** error (see [Connection resets on AWS PrivateLink or NLB](#troubleshooting-connection-resets) below).
+Inserts have no connector-level timeout. This means a stalled or very slow insert will hang until a network device terminates the connection — which can produce a **"Broken pipe"** error.
 
 ### Java client connection timeouts {#timeout-java-client}
 
@@ -1856,7 +1862,7 @@ These are passed through to the underlying ClickHouse Java client using the `opt
 | Catalog property | Default | Unit | What it controls |
 |---|---|---|---|
 | `option.socket_timeout` | `0` (unlimited) | ms | Deadline for each TCP read/write operation. `0` means no deadline — the call blocks indefinitely if the server stops responding. |
-| `option.connection_timeout` | not set | ms | Time allowed to establish a new TCP connection to ClickHouse. |
+| `option.connection_timeout` | not set (Java client default) | ms | Time allowed to establish a new TCP connection to ClickHouse. |
 | `option.connection_ttl` | `-1` (unlimited) | ms | Maximum lifetime of a pooled connection. After this time, the connection is retired and not reused. **Set this below the idle timeout of any network appliance between Spark and ClickHouse** (e.g., `300000` for AWS NLB). |
 | `option.http_keep_alive_timeout` | server default | ms | Overrides the HTTP keep-alive timeout. Set to `0` to disable keep-alive on network paths with aggressive idle-connection cutoffs. |
 
@@ -1869,27 +1875,6 @@ These are ClickHouse query settings sent with each request. They instruct the Cl
 | `option.clickhouse_setting_max_execution_time` | `0` (unlimited) | seconds | Server-side hard cap on query execution time. Useful for preventing runaway reads from consuming server resources, but **does not override the connector's 60-second query timeout**. |
 | `option.clickhouse_setting_session_timeout` | `60` | seconds | HTTP session lifetime on the server. |
 
-### Common timeout scenarios {#timeout-scenarios}
-
-**Long-running reads (>60 seconds)**
-
-The connector's hard-coded 60-second query timeout will abort reads that take longer, even if the server could complete them. There is no configuration knob to extend or disable this cap. The only workaround is to reduce the amount of data returned by a single query so each read completes within 60 seconds:
-
-- Add `WHERE` predicates to filter rows before they reach Spark.
-- Use partition pruning so the connector issues one query per partition rather than one large scan.
-
-**Insert "Broken pipe" or "Connection reset" errors**
-
-Typically caused by a network appliance (AWS NLB, PrivateLink, corporate proxy) silently dropping idle connections. The connector's connection pool holds connections open between operations; if the appliance's idle timeout (350 seconds for AWS NLB) expires, the connection is dropped at the network level without notification. The connector then tries to reuse a dead connection and gets a broken pipe.
-
-```text
-spark.sql.catalog.clickhouse.option.connection_ttl         300000
-spark.sql.catalog.clickhouse.option.socket_timeout         300000
-spark.sql.catalog.clickhouse.option.http_keep_alive_timeout  0
-```
-
-Setting `connection_ttl` below the appliance's idle timeout forces the connector to retire connections proactively. See [Connection resets on AWS PrivateLink or NLB](#troubleshooting-connection-resets) for full details.
-
 ## Performance tuning {#performance-tuning}
 
 ### Read performance {#read-performance}
@@ -1908,10 +1893,9 @@ Setting `connection_ttl` below the appliance's idle timeout forces the connector
 | Tuning | Configuration | Notes |
 |---|---|---|
 | **Increase batch size** | `spark.clickhouse.write.batchSize=50000` | Default is 10,000. Larger batches reduce round-trips. |
-| **Arrow write format** | `spark.clickhouse.write.format=arrow` (default) | Arrow is faster than JSON for most data types. Use `json` only for Variant/JSON column types. |
+| **Arrow write format** | `spark.clickhouse.write.format=arrow` (default) | Arrow is faster than JSON for most data types. On Spark 4.0 the Arrow writer handles `VariantType` columns via an internal JSON-string conversion; on Spark 3.x, fall back to `json` if your table has Variant or JSON columns. |
 | **Compression** | `spark.clickhouse.write.compression.codec=lz4` (default) | Reduces network transfer during writes. |
 | **Repartition by partition** | `spark.clickhouse.write.repartitionByPartition=true` (default) | Groups rows by partition before writing, reducing the number of parts created. |
-| **Explicit partition count** | `spark.clickhouse.write.repartitionNum=N` | Forces Spark to repartition to exactly N partitions before writing (via `requiredNumPartitions()`). Default `0` means no requirement. Alternatively, use `df.repartition(N)` before the write call. |
 
 ### Recommended starting configuration for bulk loads {#bulk-load-config}
 
@@ -1952,60 +1936,6 @@ spark.conf.set("spark.clickhouse.write.repartitionByPartition", "true")
 If you also have `spark.clickhouse.write.distributed.convertLocal=true`, ignoring unsupported sharding keys can cause incorrect data distribution. In that case, either use a supported sharding key or set `spark.clickhouse.write.distributed.convertLocal.allowUnsupportedSharding=true` only if you have verified your data distribution — rows may be written to the wrong shard, causing data skew or incorrect query results when querying shards directly.
 :::
 
----
-
-### Table not found or stale schema after DDL {#troubleshooting-stale-schema}
-
-**Symptom**: After creating or altering a ClickHouse table, Spark still sees the old schema or raises a "table not found" error.
-
-**Cause**: Spark caches catalog metadata. For ClickHouse Cloud, replication to all nodes may also take a moment.
-
-**Fix**:
-```python
-spark.catalog.refreshTable("clickhouse.database.table_name")
-```
-
----
-
-### Connection resets on AWS PrivateLink or NLB {#troubleshooting-connection-resets}
-
-**Symptom**: Reads or writes intermittently fail with `Broken pipe`, `Connection reset by peer`, or `java.io.IOException: Connection closed` — particularly on longer-running jobs or after a period of inactivity.
-
-**Cause**: AWS Network Load Balancers (NLB) and PrivateLink silently drop TCP connections that have been idle for **350 seconds**. The connector's connection pool keeps HTTP connections alive between operations. When the NLB drops a pooled connection, neither the connector nor ClickHouse is notified. The next operation that picks up that connection fails immediately.
-
-Inserts are especially vulnerable: there is no connector-level insert timeout, so a slow or large insert can stall mid-flight when the underlying connection is cut.
-
-**Fix**: Retire pooled connections before the NLB does by setting `connection_ttl` below 350 seconds. Also set `socket_timeout` to bound individual operations and disable HTTP keep-alive to prevent stale connections from persisting in the pool:
-
-```text
-spark.sql.catalog.clickhouse.option.connection_ttl           300000
-spark.sql.catalog.clickhouse.option.socket_timeout           300000
-spark.sql.catalog.clickhouse.option.http_keep_alive_timeout  0
-```
-
-For very large inserts that are genuinely expected to take several minutes, switch to async inserts so the server acknowledges receipt quickly and the connector is not waiting on a long-lived open connection:
-
-```text
-spark.sql.catalog.clickhouse.option.clickhouse_setting_async_insert             1
-spark.sql.catalog.clickhouse.option.clickhouse_setting_wait_for_async_insert    1
-```
-
----
-
-### KEEPER_EXCEPTION during writes {#troubleshooting-keeper-exception}
-
-**Symptom**: Writes fail with `Coordination::Exception: Can't get data for node ... node doesn't exist (KEEPER_EXCEPTION)`. Retries may produce duplicate data.
-
-**Cause**: Too many concurrent Spark tasks overwhelm the ClickHouse replica's coordination layer (ClickHouse Keeper / ZooKeeper). This is common when the replica has limited CPU or memory relative to the write concurrency.
-
-**Fix**:
-- Reduce the number of Spark write tasks by repartitioning the DataFrame before writing:
-  ```python
-  df.repartition(N).writeTo("clickhouse.db.table").append()
-  ```
-- Increase the ClickHouse replica's compute resources (CPU / memory).
-- If using `ReplicatedMergeTree`, consider writing through a `Distributed` table to spread load across shards.
-
 ## Contributing and support {#contributing-and-support}
 
 If you'd like to contribute to the project or report any issues, we welcome your input!