Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions README-template.md
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,31 @@ df.writeStream \

**Important:** The connector does not configure the GCS connector, in order to avoid conflict with another GCS connector, if exists. In order to use the write capabilities of the connector, please configure the GCS connector on your cluster as explained [here](https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs).

#### Schema Behavior on Overwrite

When using `SaveMode.Overwrite` (`.mode("overwrite")`), the connector **preserves the existing table's schema**.
The data is truncated, but column types, descriptions, and policy tags are retained.

```
df.write \
.format("bigquery") \
.mode("overwrite") \
.option("temporaryGcsBucket","some-bucket") \
.save("dataset.table")
```

**Important:** If your DataFrame has a different schema than the existing table (e.g., changing a column from
`INTEGER` to `DOUBLE`), the write will fail with a type mismatch error. To change the schema, either:
- Drop the table before overwriting
- Use BigQuery DDL to alter the table schema first

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some of the schema difference the following options can work with overwrite:
Programmatic Relaxation: Set .option("allowFieldRelaxation", "true") for nullability changes and .option("allowFieldAddition", "true") for new columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can add this detail. Shall I add it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, please check.

For some of the schema difference, the following options can work with overwrite:
Programmatic Relaxation: Set `.option("allowFieldRelaxation", "true")` for nullability changes and `.option("allowFieldAddition", "true")` for new columns.

This behavior was introduced between version 0.22.0 and 0.41.0 to prevent accidental schema drift.

**Note:** This behavior applies to both the `indirect` (default) and `direct` write methods.

### Running SQL on BigQuery

The connector supports Spark's [SparkSession#executeCommand](https://archive.apache.org/dist/spark/docs/3.0.0/api/java/org/apache/spark/sql/SparkSession.html#executeCommand-java.lang.String-java.lang.String-scala.collection.immutable.Map-)
Expand Down
Loading