Skip to content

Commit c66234f

Browse files
docs: Document schema preservation behavior on SaveMode.Overwrite (#1456) (#1457)
* Updated the readme for better understanding of overwrite behaviour * Update README-template.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * added more info on schema changes * fix: Remove trailing whitespace from README-template.md --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
1 parent 7bdc10f commit c66234f

File tree

1 file changed

+25
-0
lines changed

1 file changed

+25
-0
lines changed

README-template.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -373,6 +373,31 @@ df.writeStream \
373373

374374
**Important:** The connector does not configure the GCS connector, in order to avoid conflict with another GCS connector, if exists. In order to use the write capabilities of the connector, please configure the GCS connector on your cluster as explained [here](https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs).
375375

376+
#### Schema Behavior on Overwrite
377+
378+
When using `SaveMode.Overwrite` (`.mode("overwrite")`), the connector **preserves the existing table's schema**.
379+
The data is truncated, but column types, descriptions, and policy tags are retained.
380+
381+
```
382+
df.write \
383+
.format("bigquery") \
384+
.mode("overwrite") \
385+
.option("temporaryGcsBucket","some-bucket") \
386+
.save("dataset.table")
387+
```
388+
389+
**Important:** If your DataFrame has a different schema than the existing table (e.g., changing a column from
390+
`INTEGER` to `DOUBLE`), the write will fail with a type mismatch error. To change the schema, either:
391+
- Drop the table before overwriting
392+
- Use BigQuery DDL to alter the table schema first
393+
394+
For some of the schema difference, the following options can work with overwrite:
395+
Programmatic Relaxation: Set `.option("allowFieldRelaxation", "true")` for nullability changes and `.option("allowFieldAddition", "true")` for new columns.
396+
397+
This behavior was introduced between version 0.22.0 and 0.41.0 to prevent accidental schema drift.
398+
399+
**Note:** This behavior applies to both the `indirect` (default) and `direct` write methods.
400+
376401
### Running SQL on BigQuery
377402

378403
The connector supports Spark's [SparkSession#executeCommand](https://archive.apache.org/dist/spark/docs/3.0.0/api/java/org/apache/spark/sql/SparkSession.html#executeCommand-java.lang.String-java.lang.String-scala.collection.immutable.Map-)

0 commit comments

Comments
 (0)