docs: Document schema preservation behavior on SaveMode.Overwrite (#1456) #1457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

yalimu-g merged 4 commits into GoogleCloudDataproc:master from sahilkumarsingh:#1456-doc-update-for-overwrite

Jan 23, 2026

README-template.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -373,6 +373,31 @@ df.writeStream \ @@
     **Important:** The connector does not configure the GCS connector, in order to avoid conflict with another GCS connector, if exists. In order to use the write capabilities of the connector, please configure the GCS connector on your cluster as explained [here](https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs).
+    #### Schema Behavior on Overwrite
+    When using `SaveMode.Overwrite` (`.mode("overwrite")`), the connector **preserves the existing table's schema**.
+    The data is truncated, but column types, descriptions, and policy tags are retained.
+    ```
+    df.write \
+      .format("bigquery") \
+      .mode("overwrite") \
+      .option("temporaryGcsBucket","some-bucket") \
+      .save("dataset.table")
+    ```
+    **Important:** If your DataFrame has a different schema than the existing table (e.g., changing a column from
+    `INTEGER` to `DOUBLE`), the write will fail with a type mismatch error. To change the schema, either:
+    - Drop the table before overwriting
+    - Use BigQuery DDL to alter the table schema first
+    For some of the schema difference, the following options can work with overwrite:
+    Programmatic Relaxation: Set `.option("allowFieldRelaxation", "true")` for nullability changes and `.option("allowFieldAddition", "true")` for new columns.
+    This behavior was introduced between version 0.22.0 and 0.41.0 to prevent accidental schema drift.
+    **Note:** This behavior applies to both the `indirect` (default) and `direct` write methods.
     ### Running SQL on BigQuery
     The connector supports Spark's [SparkSession#executeCommand](https://archive.apache.org/dist/spark/docs/3.0.0/api/java/org/apache/spark/sql/SparkSession.html#executeCommand-java.lang.String-java.lang.String-scala.collection.immutable.Map-)
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Document schema preservation behavior on SaveMode.Overwrite (#1456) #1457

Uh oh!

Diff view

Diff view

There are no files selected for viewing

yalimu-g Jan 23, 2026

Uh oh!

sahilkumarsingh Jan 23, 2026

Uh oh!

yalimu-g Jan 23, 2026

Uh oh!

sahilkumarsingh Jan 23, 2026

Uh oh!

Uh oh!

docs: Document schema preservation behavior on SaveMode.Overwrite (#1456) #1457

Uh oh!

docs: Document schema preservation behavior on SaveMode.Overwrite (#1456) #1457

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

yalimu-g Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

sahilkumarsingh Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

yalimu-g Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

sahilkumarsingh Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!