From 56a5879076b579be9fccc41292f62214fd4246ae Mon Sep 17 00:00:00 2001 From: sahilkumarsingh Date: Sun, 11 Jan 2026 03:52:02 +0530 Subject: [PATCH 1/4] Updated the readme for better understanding of overwrite behaviour --- README-template.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/README-template.md b/README-template.md index 38073fccf..5f3272d37 100644 --- a/README-template.md +++ b/README-template.md @@ -373,6 +373,28 @@ df.writeStream \ **Important:** The connector does not configure the GCS connector, in order to avoid conflict with another GCS connector, if exists. In order to use the write capabilities of the connector, please configure the GCS connector on your cluster as explained [here](https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs). +#### Schema Behavior on Overwrite + +When using `SaveMode.Overwrite` (`.mode("overwrite")`), the connector **preserves the existing table's schema**. +The data is truncated, but column types, descriptions, and policy tags are retained. + +``` +df.write \ + .format("bigquery") \ + .mode("overwrite") \ + .option("temporaryGcsBucket","some-bucket") \ + .save("dataset.table") +``` + +**Important:** If your DataFrame has a different schema than the existing table (e.g., changing a column from +`INTEGER` to `DOUBLE`), the write will fail with a type mismatch error. To change the schema, either: +- Drop the table before overwriting +- Use BigQuery DDL to alter the table schema first + +This behavior was introduced between version 0.22.0 and 0.41.0 to prevent accidental schema drift. + +**Note:** This behavior applies to both the `indirect` (default) and `direct` write methods. + ### Running SQL on BigQuery The connector supports Spark's [SparkSession#executeCommand](https://archive.apache.org/dist/spark/docs/3.0.0/api/java/org/apache/spark/sql/SparkSession.html#executeCommand-java.lang.String-java.lang.String-scala.collection.immutable.Map-) From b1f8f848f5b90e652907cb6c14c63aebecee5804 Mon Sep 17 00:00:00 2001 From: Sahil Kumar Singh <38356279+sahilkumarsingh@users.noreply.github.com> Date: Sun, 11 Jan 2026 04:35:33 +0530 Subject: [PATCH 2/4] Update README-template.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README-template.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README-template.md b/README-template.md index 5f3272d37..0f9f37b69 100644 --- a/README-template.md +++ b/README-template.md @@ -391,7 +391,7 @@ df.write \ - Drop the table before overwriting - Use BigQuery DDL to alter the table schema first -This behavior was introduced between version 0.22.0 and 0.41.0 to prevent accidental schema drift. +This behavior was introduced between version 0.22.2 and 0.41.0 to prevent accidental schema drift. **Note:** This behavior applies to both the `indirect` (default) and `direct` write methods. From a6a8f69332369618850ddd87d56777195bac1842 Mon Sep 17 00:00:00 2001 From: sahilkumarsingh Date: Fri, 23 Jan 2026 22:50:22 +0530 Subject: [PATCH 3/4] added more info on schema changes --- README-template.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README-template.md b/README-template.md index 0f9f37b69..a7c0025e1 100644 --- a/README-template.md +++ b/README-template.md @@ -391,7 +391,10 @@ df.write \ - Drop the table before overwriting - Use BigQuery DDL to alter the table schema first -This behavior was introduced between version 0.22.2 and 0.41.0 to prevent accidental schema drift. +For some of the schema difference, the following options can work with overwrite: +Programmatic Relaxation: Set `.option("allowFieldRelaxation", "true")` for nullability changes and `.option("allowFieldAddition", "true")` for new columns. + +This behavior was introduced between version 0.22.0 and 0.41.0 to prevent accidental schema drift. **Note:** This behavior applies to both the `indirect` (default) and `direct` write methods. From 2c2c5814913be3a629298f281aa98c3923ca6aca Mon Sep 17 00:00:00 2001 From: sahilkumarsingh Date: Fri, 23 Jan 2026 23:14:36 +0530 Subject: [PATCH 4/4] fix: Remove trailing whitespace from README-template.md --- README-template.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README-template.md b/README-template.md index a7c0025e1..edfd8ece0 100644 --- a/README-template.md +++ b/README-template.md @@ -375,7 +375,7 @@ df.writeStream \ #### Schema Behavior on Overwrite -When using `SaveMode.Overwrite` (`.mode("overwrite")`), the connector **preserves the existing table's schema**. +When using `SaveMode.Overwrite` (`.mode("overwrite")`), the connector **preserves the existing table's schema**. The data is truncated, but column types, descriptions, and policy tags are retained. ``` @@ -386,7 +386,7 @@ df.write \ .save("dataset.table") ``` -**Important:** If your DataFrame has a different schema than the existing table (e.g., changing a column from +**Important:** If your DataFrame has a different schema than the existing table (e.g., changing a column from `INTEGER` to `DOUBLE`), the write will fail with a type mismatch error. To change the schema, either: - Drop the table before overwriting - Use BigQuery DDL to alter the table schema first