-
Notifications
You must be signed in to change notification settings - Fork 124
Update schemas and add schema.allow_field_addition metadata #8682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c6692a6 to
d916e20
Compare
This comment has been minimized.
This comment has been minimized.
0f2cb97 to
19a8f59
Compare
sql/moz-fx-data-shared-prod/firefox_accounts_prod_logs_syndicate/dataset_metadata.yaml
Outdated
Show resolved
Hide resolved
19a8f59 to
9521605
Compare
9521605 to
526267e
Compare
sql/moz-fx-data-shared-prod/firefox_accounts_gke_logs_syndicate/dataset_metadata.yaml
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
87fa95e to
14dc03c
Compare
This comment has been minimized.
This comment has been minimized.
Integration report for "Remove google ads syndication"
|
sean-rose
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- issue (blocking): The
firefox_accounts_derived.recent_fxa_gcp_*_events_v1ETLs also need theallow_field_additionmetadata added. - question: The following ETLs had publishing errors when all schema updates for existing
schema.yamlfiles were being skipped, so do they also need to be updated in some way?- Various
active_users_aggregates_v3ETLs. - Various
new_profile_activations_v1ETLs. google_ads_derived.daily_ad_group_stats_v1google_ads_derived.daily_campaign_stats_v1telemetry_derived.ech_adoption_rate_v1
- Various
| ) | ||
|
|
||
| # Check if query metadata has ALLOW_FIELD_ADDITION | ||
| # Check if query metadata has allow_field_addition flag or ALLOW_FIELD_ADDITION argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (non-blocking): Maybe it would make sense to automatically add the --schema_update_option=ALLOW_FIELD_ADDITION argument for incremental SQL ETLs with allow_field_addition: true in their metadata.yaml?
| "--schema_update_option=ALLOW_FIELD_ADDITION" in arg | ||
| for arg in arguments | ||
| ) | ||
| if metadata: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (non-blocking): If the Metadata.of_query_file() succeeds then the return value won't be None, so I don't think the if metadata: check is needed.
| derived_from: List[SchemaDerivedMetadata] = attr.ib([]) | ||
| # indicates that the schema might change over time and should be updated even with --skip-existing | ||
| allow_field_addition: bool = attr.ib(False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: Setting default values like this without additional logic controlling what values get written to metadata.yaml files means that when certain bqetl commands rewrite metadata.yaml files these default values will get added.
For example, running bqetl query schema update on any of the ETLs where you've added the allow_field_addition metadata will now result in derived_from: [] being added to their metadata.yaml. And similarly, running bqetl query schema update on any ETLs that have actual derived_from metadata will now result in allow_field_addition: false being added to their metadata.yaml.
This isn't a huge deal, but I think it could be nice if we more consistently omitted fields from metadata.yaml when they're set to the default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (non-blocking): I'd suggest reverting the unrelated changes to this metadata.yaml file, which I'm guessing were a result of running bqetl query schema update for this ETL.
(ditto for glam_derived/client_probe_counts_firefox_desktop_nightly_v1/metadata.yaml and telemetry_derived/clients_first_seen_v1/metadata.yaml)
Description
This is a follow-up to the discussion in https://mozilla.slack.com/archives/C01E8GDG80N/p1767740035519219 and the failing run in https://workflow.telemetry.mozilla.org/dags/bqetl_artifact_deployment/grid?dag_run_id=manual__2026-01-06T22%3A28%3A57.835425%2B00%3A00&task_id=publish_new_tables&tab=logs
This PR updates schemas that were marked as out-of-date by the Airflow run and adds a new metadata option:
This option indicates whether the query schema can change over, which is for example the case when doing a
SELECT *on a different table that might add more fields over time. This option will trigger a schema re-generation in the schema update step (e.g. before deploying artifacts) and ensures that schemas are up-to-date.Another option that was discussed was adding syndicate configs for Google Ads tables and authorizing the fxa datasets (https://github.com/mozilla/webservices-infra/pull/9025) and have the stage deploy step generate the most recent schemas. One issue I came to realize with that is removing the
schema.yamlfiles for these types of queries also removes a way to add column descriptions. Since that metadata can be quite valuable having theallow_field_additionpreserves these descriptions but also updates the schemas.I added the
allow_field_additionto the relevant FxA queries and Google Ads queries that showed up. There might be more popping up in the future when running artifact deploys. Adding the metadata option should be an easy enough fix though and also serves as an explicit indicator that the schemas can evolve over time.Related Tickets & Documents
Reviewer, please follow this checklist