Skip to content

fix(firestore-bigquery-change-tracker): validation and partitioning fixes #2435

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: next
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 3 additions & 7 deletions firestore-bigquery-export/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -269,15 +269,11 @@ To install an extension, your project must be on the [Blaze (pay as you go) plan

* BigQuery SQL table Time Partitioning option type: This parameter will allow you to partition the BigQuery table and BigQuery view created by the extension based on data ingestion time. You may select the granularity of partitioning based upon one of: HOUR, DAY, MONTH, YEAR. This will generate one partition per day, hour, month or year, respectively.

* BigQuery Time Partitioning column name: BigQuery table column/schema field name for TimePartitioning. You can choose schema available as `timestamp` OR a new custom defined column that will be assigned to the selected Firestore Document field below. Defaults to pseudo column _PARTITIONTIME if unspecified. Cannot be changed if Table is already partitioned.
* BigQuery Time Partitioning column name: The name of the column in the BigQuery table that will be used for time partitioning. This field should correspond to the timestamp or date field in your data that will be used for partitioning the table. By default, the extension will use the _PARTITIONTIME pseudo column.

* Firestore Document field name for BigQuery SQL Time Partitioning field option: This parameter will allow you to partition the BigQuery table created by the extension based on selected. The Firestore Document field value must be a top-level TIMESTAMP, DATETIME, DATE field BigQuery string format or Firestore timestamp(will be converted to BigQuery TIMESTAMP). Cannot be changed if Table is already partitioned.
example: `postDate`(Ensure that the Firestore-BigQuery export extension
creates the dataset and table before initiating any backfill scripts.
This step is crucial for the partitioning to function correctly. It is
essential for the script to insert data into an already partitioned table.)
* BigQuery SQL Time Partitioning table schema field(column) type: This parameter defines the schema field type in BigQuery for the selected Time Partitioning field option. It cannot be modified if the table is already partitioned.

* BigQuery SQL Time Partitioning table schema field(column) type: Parameter for BigQuery SQL schema field type for the selected Time Partitioning Firestore Document field option. Cannot be changed if Table is already partitioned.
* Firestore Document field name for BigQuery SQL Time Partitioning field option: This parameter enables partitioning of the BigQuery table created by the extension based on the selected field in the Firestore Document. The field value must be a top-level TIMESTAMP, DATETIME, or DATE field in BigQuery string format. This setting cannot be modified if the table is already partitioned. Example: Use `postDate` as the field for partitioning. Ensure that the Firestore-BigQuery export extension has created the dataset and table before running any backfill scripts. This step is crucial for correct partitioning functionality and for inserting data into an already partitioned table.

* BigQuery SQL table clustering: This parameter allows you to set up clustering for the BigQuery table created by the extension. Specify up to 4 comma-separated fields (for example: `data,document_id,timestamp` - no whitespaces). The order of the specified columns determines the sort order of the data.
Note: Cluster columns must be top-level, non-repeated columns of one of the following types: BIGNUMERIC, BOOL, DATE, DATETIME, GEOGRAPHY, INT64, NUMERIC, RANGE, STRING, TIMESTAMP. Clustering will not be added if a field with an invalid type is present in this parameter.
Expand Down
51 changes: 25 additions & 26 deletions firestore-bigquery-export/extension.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -431,38 +431,20 @@ params:
- param: TIME_PARTITIONING_FIELD
label: BigQuery Time Partitioning column name
description: >-
BigQuery table column/schema field name for TimePartitioning. You can
choose schema available as `timestamp` OR a new custom defined column that
will be assigned to the selected Firestore Document field below. Defaults
to pseudo column _PARTITIONTIME if unspecified. Cannot be changed if Table
is already partitioned.
type: string
required: false

- param: TIME_PARTITIONING_FIRESTORE_FIELD
label:
Firestore Document field name for BigQuery SQL Time Partitioning field
option
description: >-
This parameter will allow you to partition the BigQuery table created by
the extension based on selected. The Firestore Document field value must
be a top-level TIMESTAMP, DATETIME, DATE field BigQuery string format or
Firestore timestamp(will be converted to BigQuery TIMESTAMP). Cannot be
changed if Table is already partitioned.
example: `postDate`(Ensure that the Firestore-BigQuery export extension
creates the dataset and table before initiating any backfill scripts.
This step is crucial for the partitioning to function correctly. It is
essential for the script to insert data into an already partitioned
table.)
BigQuery Time Partitioning column name: The name of the column in the
BigQuery table that will be used for time partitioning. This field should
correspond to the timestamp or date field in your data that will be used
for partitioning the table. By default, the extension will use the
_PARTITIONTIME pseudo column.
type: string
required: false

- param: TIME_PARTITIONING_FIELD_TYPE
label: BigQuery SQL Time Partitioning table schema field(column) type
description: >-
Parameter for BigQuery SQL schema field type for the selected Time
Partitioning Firestore Document field option. Cannot be changed if Table
is already partitioned.
This parameter defines the schema field type in BigQuery for the selected
Time Partitioning field option. It cannot be modified if the table is
already partitioned.
type: select
options:
- label: TIMESTAMP
Expand All @@ -476,6 +458,23 @@ params:
default: omit
required: false

- param: TIME_PARTITIONING_FIRESTORE_FIELD
label:
Firestore Document field name for BigQuery SQL Time Partitioning field
option
description: >-
This parameter enables partitioning of the BigQuery table created by the
extension based on the selected field in the Firestore Document. The field
value must be a top-level TIMESTAMP, DATETIME, or DATE field in BigQuery
string format. This setting cannot be modified if the table is already
partitioned. Example: Use `postDate` as the field for partitioning. Ensure
that the Firestore-BigQuery export extension has created the dataset and
table before running any backfill scripts. This step is crucial for
correct partitioning functionality and for inserting data into an already
partitioned table.
type: string
required: false

- param: CLUSTERING
label: BigQuery SQL table clustering
description: >-
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"url": "github.com/firebase/extensions.git",
"directory": "firestore-bigquery-export/firestore-bigquery-change-tracker"
},
"version": "1.1.42",
"version": "1.1.43",
"description": "Core change-tracker library for Cloud Firestore Collection BigQuery Exports",
"main": "./lib/index.js",
"scripts": {
Expand Down
Loading
Loading