Skip to content

Conversation

@kbammarito
Copy link
Contributor

Description

This PR:

  • creates search_derived.legacy_ad_clicks_by_state_v1 as a daily aggregated table (ad clicks by day and engine)
  • creates search.legacy_ad_clicks_by_state as a monthly aggregated and pivoted view (ad clicks by month and engine)

Related Tickets & Documents

Reviewer, please follow this checklist

@kbammarito kbammarito requested a review from kwindau January 7, 2026 20:51
@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot
Copy link

Integration report for "Merge branch 'DENG-10147-build-legacy-aggregate-table-with-ad-clicks-by-state' of github.com:mozilla/bigquery-etl into DENG-10147-build-legacy-aggregate-table-with-ad-clicks-by-state"

sql.diff

Click to expand!
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/dags/bqetl_search.py /tmp/workspace/generated-sql/dags/bqetl_search.py
--- /tmp/workspace/main-generated-sql/dags/bqetl_search.py	2026-01-08 15:24:04.000000000 +0000
+++ /tmp/workspace/generated-sql/dags/bqetl_search.py	2026-01-08 15:25:02.000000000 +0000
@@ -53,6 +53,19 @@
     catchup=False,
 ) as dag:
 
+    wait_for_telemetry_derived__clients_daily_joined__v1 = ExternalTaskSensor(
+        task_id="wait_for_telemetry_derived__clients_daily_joined__v1",
+        external_dag_id="bqetl_main_summary",
+        external_task_id="telemetry_derived__clients_daily_joined__v1",
+        execution_delta=datetime.timedelta(seconds=3600),
+        check_existence=True,
+        mode="reschedule",
+        poke_interval=datetime.timedelta(minutes=5),
+        allowed_states=ALLOWED_STATES,
+        failed_states=FAILED_STATES,
+        pool="DATA_ENG_EXTERNALTASKSENSOR",
+    )
+
     wait_for_copy_deduplicate_main_ping = ExternalTaskSensor(
         task_id="wait_for_copy_deduplicate_main_ping",
         external_dag_id="copy_deduplicate",
@@ -92,19 +105,6 @@
         pool="DATA_ENG_EXTERNALTASKSENSOR",
     )
 
-    wait_for_telemetry_derived__clients_daily_joined__v1 = ExternalTaskSensor(
-        task_id="wait_for_telemetry_derived__clients_daily_joined__v1",
-        external_dag_id="bqetl_main_summary",
-        external_task_id="telemetry_derived__clients_daily_joined__v1",
-        execution_delta=datetime.timedelta(seconds=3600),
-        check_existence=True,
-        mode="reschedule",
-        poke_interval=datetime.timedelta(minutes=5),
-        allowed_states=ALLOWED_STATES,
-        failed_states=FAILED_STATES,
-        pool="DATA_ENG_EXTERNALTASKSENSOR",
-    )
-
     wait_for_clients_first_seen_v3 = ExternalTaskSensor(
         task_id="wait_for_clients_first_seen_v3",
         external_dag_id="bqetl_analytics_tables",
@@ -118,6 +118,22 @@
         pool="DATA_ENG_EXTERNALTASKSENSOR",
     )
 
+    search_derived__legacy_ad_clicks_by_state__v1 = bigquery_etl_query(
+        task_id="search_derived__legacy_ad_clicks_by_state__v1",
+        destination_table="legacy_ad_clicks_by_state_v1",
+        dataset_id="search_derived",
+        project_id="moz-fx-data-shared-prod",
+        owner="[email protected]",
+        email=[
+            "[email protected]",
+            "[email protected]",
+            "[email protected]",
+            "[email protected]",
+        ],
+        date_partition_parameter="submission_date",
+        depends_on_past=False,
+    )
+
     search_derived__search_aggregates__v8 = bigquery_etl_query(
         task_id="search_derived__search_aggregates__v8",
         destination_table="search_aggregates_v8",
@@ -316,6 +332,14 @@
         depends_on_past=False,
     )
 
+    search_derived__legacy_ad_clicks_by_state__v1.set_upstream(
+        search_derived__search_clients_daily__v8
+    )
+
+    search_derived__legacy_ad_clicks_by_state__v1.set_upstream(
+        wait_for_telemetry_derived__clients_daily_joined__v1
+    )
+
     search_derived__search_aggregates__v8.set_upstream(
         search_derived__search_clients_daily__v8
     )
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search: legacy_ad_clicks_by_state
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search_derived: legacy_ad_clicks_by_state_v1
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/search/legacy_ad_clicks_by_state/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search/legacy_ad_clicks_by_state/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/search/legacy_ad_clicks_by_state/metadata.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search/legacy_ad_clicks_by_state/metadata.yaml	2026-01-08 15:20:17.000000000 +0000
@@ -0,0 +1,18 @@
+friendly_name: Legacy Ad Clicks By State
+description: |-
+  Aggregate table that pivots and sums ad clicks by month, state, and engine.
+owners:
+- [email protected]
+labels:
+  owner: kbammarito
+  owner1: kbammarito
+workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:dataops-managed/external-fides
+  - workgroup:mozilla-confidential
+references:
+  view.sql:
+  - moz-fx-data-shared-prod.search_derived.legacy_ad_clicks_by_state_v1
+require_column_descriptions: false
+level: null
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/search/legacy_ad_clicks_by_state/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search/legacy_ad_clicks_by_state/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/search/legacy_ad_clicks_by_state/schema.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search/legacy_ad_clicks_by_state/schema.yaml	2026-01-08 15:18:48.000000000 +0000
@@ -0,0 +1,26 @@
+fields:
+- name: month
+  type: DATE
+  mode: NULLABLE
+  description: |
+    The submission_date's month (from the v1 table).
+- name: state
+  type: STRING
+  mode: NULLABLE
+  description: |
+    The US state.
+- name: Bing
+  type: INTEGER
+  mode: NULLABLE
+  description: |
+    The sum of ad clicks by month for Bing.
+- name: Google
+  type: INTEGER
+  mode: NULLABLE
+  description: |
+    The sum of ad clicks by month for Google.
+- name: DuckDuckGo
+  type: INTEGER
+  mode: NULLABLE
+  description: |
+    The sum of ad clicks by month for DuckDuckGo.
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/search/legacy_ad_clicks_by_state/view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search/legacy_ad_clicks_by_state/view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/search/legacy_ad_clicks_by_state/view.sql	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search/legacy_ad_clicks_by_state/view.sql	2026-01-08 15:18:48.000000000 +0000
@@ -0,0 +1,30 @@
+CREATE OR REPLACE VIEW
+  `moz-fx-data-shared-prod.search.legacy_ad_clicks_by_state`
+AS
+WITH pivoted AS (
+  SELECT
+    *
+  FROM
+    (
+      SELECT
+        `month`,
+        normalized_engine,
+        `state`,
+        daily_ad_clicks
+      FROM
+        `moz-fx-data-shared-prod.search_derived.legacy_ad_clicks_by_state_v1`
+    ) pivot(SUM(daily_ad_clicks) FOR normalized_engine IN ('Bing', 'DuckDuckGo', 'Google'))
+  ORDER BY
+    `month`,
+    state
+),
+final AS (
+  SELECT
+    *
+  FROM
+    pivoted
+)
+SELECT
+  *
+FROM
+  final
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/search_derived/legacy_ad_clicks_by_state_v1/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search_derived/legacy_ad_clicks_by_state_v1/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/search_derived/legacy_ad_clicks_by_state_v1/metadata.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search_derived/legacy_ad_clicks_by_state_v1/metadata.yaml	2026-01-08 15:20:15.000000000 +0000
@@ -0,0 +1,33 @@
+friendly_name: Legacy Ad Clicks By State
+description: |-
+  Aggregate table that sums ad clicks by day, state, and engine.
+owners:
+- [email protected]
+labels:
+  incremental: true
+  owner1: kbammarito
+  table_type: aggregate
+  shredder_mitigation: true
+  dag: bqetl_search
+scheduling:
+  dag_name: bqetl_search
+bigquery:
+  time_partitioning:
+    type: day
+    field: submission_date
+    require_partition_filter: false
+    expiration_days: null
+  range_partitioning: null
+  clustering:
+    fields:
+    - normalized_engine
+workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:mozilla-confidential
+references:
+  query.sql:
+  - moz-fx-data-shared-prod.search_derived.search_clients_daily_v8
+  - moz-fx-data-shared-prod.telemetry_derived.clients_daily_joined_v1
+require_column_descriptions: true
+level: null
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/search_derived/legacy_ad_clicks_by_state_v1/query.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search_derived/legacy_ad_clicks_by_state_v1/query.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/search_derived/legacy_ad_clicks_by_state_v1/query.sql	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search_derived/legacy_ad_clicks_by_state_v1/query.sql	2026-01-08 15:18:48.000000000 +0000
@@ -0,0 +1,72 @@
+-- Query for search_derived.legacy_ad_clicks_by_state_v1
+            -- For more information on writing queries see:
+            -- https://docs.telemetry.mozilla.org/cookbooks/bigquery/querying.html
+WITH ad_click_clients AS (
+  SELECT
+    submission_date,
+    client_id,
+    `moz-fx-data-shared-prod`.udf.normalize_search_engine(engine) AS normalized_engine,
+    ad_click
+  FROM
+    `moz-fx-data-shared-prod.search_derived.search_clients_daily_v8`
+  WHERE
+    submission_date = @submission_date
+    AND country = 'US'
+    AND `moz-fx-data-shared-prod`.udf.normalize_search_engine(engine) IN (
+      'Bing',
+      'DuckDuckGo',
+      'Google'
+    )
+),
+ad_click_states AS (
+  SELECT
+    submission_date,
+    client_id,
+    geo_subdivision1 AS `state`
+  FROM
+    `moz-fx-data-shared-prod.telemetry_derived.clients_daily_joined_v1`
+  WHERE
+    submission_date = @submission_date
+    AND country = 'US'
+    AND client_id IN (SELECT client_id FROM ad_click_clients)
+),
+join_states_and_clients AS (
+  SELECT
+    ad_click_clients.submission_date,
+    DATE_TRUNC(ad_click_clients.submission_date, month) AS `month`,
+    ad_click_clients.client_id,
+    ad_click_clients.normalized_engine,
+    ad_click_clients.ad_click,
+    ad_click_states.state
+  FROM
+    ad_click_clients
+  LEFT JOIN
+    ad_click_states
+    ON ad_click_clients.submission_date = ad_click_states.submission_date
+    AND ad_click_clients.client_id = ad_click_states.client_id
+),
+daily_table AS (
+  SELECT
+    submission_date,
+    `month`,
+    normalized_engine,
+    `state`,
+    SUM(ad_click) AS daily_ad_clicks
+  FROM
+    join_states_and_clients
+  GROUP BY
+    submission_date,
+    `month`,
+    normalized_engine,
+    `state`
+),
+final AS (
+  SELECT
+    *
+  FROM
+    daily_table
+)
+SELECT
+  *
+FROM
+  final
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/search_derived/legacy_ad_clicks_by_state_v1/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search_derived/legacy_ad_clicks_by_state_v1/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/search_derived/legacy_ad_clicks_by_state_v1/schema.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search_derived/legacy_ad_clicks_by_state_v1/schema.yaml	2026-01-08 15:18:48.000000000 +0000
@@ -0,0 +1,26 @@
+fields:
+- name: submission_date
+  type: DATE
+  mode: NULLABLE
+  description: |
+    The date when the telemetry ping is received on the server side.
+- name: month
+  type: DATE
+  mode: NULLABLE
+  description: |
+    The submission_date's month.
+- name: normalized_engine
+  type: STRING
+  mode: NULLABLE
+  description: |
+    The normalized version of the search engine.
+- name: state
+  type: STRING
+  mode: NULLABLE
+  description: |
+    The US state
+- name: daily_ad_clicks
+  type: INTEGER
+  mode: NULLABLE
+  description: |
+    The sum of ad clicks by day.

Link to full diff

@kbammarito kbammarito requested a review from skahmann3 January 8, 2026 15:39
@kbammarito
Copy link
Contributor Author

@skahmann3, I noticed that there are null states. Is that OK? And what about territories—do we track them? I don't know how territories work for tax purposes.

@skahmann3
Copy link
Contributor

@kbammarito - we just need US states. If our telemetry records null occasionally, that's fine, as long as that's the true state of our telemetry. Please confirm the null did not come from a SQL issue. Thanks for your help!

@kbammarito
Copy link
Contributor Author

@skahmann3—Confirmed: I see null states (geo_subdivision1) in the original telemetry_derived.clients_daily_joined_v1 even when country = 'US'

@kbammarito kbammarito marked this pull request as ready for review January 8, 2026 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants