Skip to content

Add stop time update likely arrivals + initial metrics by trip-stop grain #3917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

tiffanychu90
Copy link
Member

@tiffanychu90 tiffanychu90 commented May 15, 2025

Description

3 new tables:

  1. Add back the table (renamed to mart_gtfs.fct_stop_time_arrivals), last created in mart_adhoc for RT trip updates / likely stop arrival. Implement the sql in Feature: Calculate each trip's likely arrival at each stop #2684.

  2. fct_stop_time_updates_with_arrivals: add an intermediate step where fct_stop_time_updates is merged with the fct_stop_time_arrivals

  3. fct_stop_time_update_metrics: start adding the minute binned metrics (trip update completeness) and summarizing across all the minutes of predictions within 30 min of the actual arrival.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature - add mart_gtfs.fct_stop_time_arrivals, which uses fct_stop_time_updates and fct_trip_updates_summaries
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

How has this been tested?

Added trip_instance_key to the column, but found that I had to switch the service_date to be a date that was available in fct_trip_updates_summaries (which might be run weekly as an incremental table). Picked one date in common with the lookback period for RT tables and what has been materialized in fct_trip_updates_summaries to see if trip_instance_key is added correctly.

jovyan@jupyter-tiffanychu90 ~/data-infra/warehouse (reinstate-rt-stop-times-table) $ poetry run dbt run -s +fct_stop_time_update_metrics
16:48:40  16 of 21 START sql view model tiffany_mart_gtfs.fct_stop_time_updates .......... [RUN]
16:48:41  16 of 21 OK created sql view model tiffany_mart_gtfs.fct_stop_time_updates ..... [CREATE VIEW (0 processed) in 0.86s]
16:48:41  17 of 21 START sql view model tiffany_mart_gtfs.fct_stop_time_arrivals ......... [RUN]
16:48:41  18 of 21 START sql incremental model tiffany_staging.int_gtfs_rt__trip_updates_trip_day_map_grouping  [RUN]
16:48:42  17 of 21 OK created sql view model tiffany_mart_gtfs.fct_stop_time_arrivals .... [CREATE VIEW (0 processed) in 0.94s]
16:48:42  19 of 21 START sql view model tiffany_mart_gtfs.fct_stop_time_updates_with_arrivals  [RUN]
16:48:42  19 of 21 OK created sql view model tiffany_mart_gtfs.fct_stop_time_updates_with_arrivals  [CREATE VIEW (0 processed) in 0.61s]
16:56:01  18 of 21 OK created sql incremental model tiffany_staging.int_gtfs_rt__trip_updates_trip_day_map_grouping  [SCRIPT (99.7 GiB processed) in 440.31s]
16:56:01  20 of 21 START sql table model tiffany_mart_gtfs.fct_trip_updates_summaries .... [RUN]
16:56:37  20 of 21 OK created sql table model tiffany_mart_gtfs.fct_trip_updates_summaries  [CREATE TABLE (1.9m rows, 50.8 GiB processed) in 36.22s]
16:56:37  21 of 21 START sql view model tiffany_mart_gtfs.fct_stop_time_update_metrics ... [RUN]
16:56:38  21 of 21 OK created sql view model tiffany_mart_gtfs.fct_stop_time_update_metrics  [CREATE VIEW (0 processed) in 0.65s]
16:56:38  
16:56:38  Finished running 13 view models, 7 table models, 1 incremental model in 0 hours 9 minutes and 29.58 seconds (569.58s).
16:56:39  
16:56:39  Completed successfully
16:56:39  
16:56:39  Done. PASS=21 WARN=0 ERROR=0 SKIP=0 TOTAL=21

This first table just ran the existing sql, updating only column names to reflect what we have. Noting here that because it's a view, whether I query on one operator's base64_url or all, a single service_date will touch 1+ TB!

Decision points pre-merge

  • Should we make this incremental? Or should we actually use it against fct_trip_update_summaries first to see what desired metrics we want, before we formally make it something that's run weekly? --> decided on making it a view

Post-merge follow-ups

Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution.

  • No action required
  • Actions required -

Copy link

github-actions bot commented May 15, 2025

Warehouse report 📦

Checks/potential follow-ups

Checks indicate the following action items may be necessary.

  • For new models, do they all have a surrogate primary key that is tested to be not-null and unique?
  • For modified incremental models (or incremental models whose parents are modified), does the PR description identify whether a full refresh is needed for these tables?

New models 🌱

calitp_warehouse.mart.gtfs.fct_stop_time_arrivals

calitp_warehouse.mart.gtfs.fct_stop_time_updates_metrics

calitp_warehouse.mart.gtfs.fct_stop_time_updates_with_arrivals

Changed incremental models 🔀

calitp_warehouse.intermediate.gtfs.int_gtfs_rt__trip_updates_trip_day_map_grouping

DAG

Legend (in order of precedence)

Resource type Indicator Resolution
Large table-materialized model Orange Make the model incremental
Large model without partitioning or clustering Orange Add partitioning and/or clustering
View with more than one child Yellow Materialize as a table or incremental
Incremental Light green
Table Green
View White

@tiffanychu90
Copy link
Member Author

Noting that the DAG diagram shows....NTD tables too? It should be a fairly limited diagram with fct_stop_time_updates, fct_trip_updates_summaries + whatever those views were derived from within int/stg

@tiffanychu90
Copy link
Member Author

@vevetron: Changed this to a view based on what we discussed so I think it's ready to be merged?

@tiffanychu90 tiffanychu90 marked this pull request as draft May 28, 2025 22:14
@tiffanychu90 tiffanychu90 force-pushed the reinstate-rt-stop-times-table branch from 9d5128a to 78e2f5f Compare May 30, 2025 15:20
@tiffanychu90 tiffanychu90 force-pushed the reinstate-rt-stop-times-table branch from 78e2f5f to f960149 Compare June 9, 2025 20:35
@tiffanychu90 tiffanychu90 changed the title Reinstate rt stop times table Add stop time update likely arrivals + initial metrics by trip-stop grain Jun 9, 2025
@tiffanychu90 tiffanychu90 marked this pull request as ready for review June 10, 2025 17:01
@tiffanychu90 tiffanychu90 force-pushed the reinstate-rt-stop-times-table branch from 7ef1f44 to b989297 Compare June 10, 2025 23:11
@vevetron
Copy link
Contributor

Why are there partition by configs if there is no materialized table? Are they necessary?

@tiffanychu90
Copy link
Member Author

tiffanychu90 commented Jun 11, 2025

Why are there partition by configs if there is no materialized table? Are they necessary?

I think the 2 I wanted to turn incremental were fct_stop_time_arrivals (since that one does reduce a lot of rows) it into an incremental table, and the fct_stop_time_update_metrics. But turning it incremental almost raises more questions than just view or table, because I don't know how to set up the lookback, I don't know how to, as dbt calls it, microbatch for dev testing. I'll just leave them as views for now!

Noting for now - can't rerun dbt because this error comes up:

Parsing Error
  Failed to render models/mart/gtfs_schedule_latest/_gtfs_schedule_latest.yml 
from project calitp_warehouse: Parsing Error
    Env var required but not provided: 'CALITP_BUCKET__PUBLISH'

@ohrite ohrite force-pushed the reinstate-rt-stop-times-table branch from 263451a to 2b50c64 Compare June 11, 2025 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants