-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
Fix/sources yml dbt2 compatibility #816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
8561655
8964c92
e6bd6c4
b07ae27
ba84a14
75769b9
6079e82
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -3,7 +3,7 @@ sources: | |||||
| description: Raw taxi trip data from NYC TLC | ||||||
| database: | | ||||||
| {%- if target.type == 'bigquery' -%} | ||||||
| {{ env_var('GCP_PROJECT_ID', 'please-add-your-gcp-project-id-here') }} | ||||||
| {{ env_var('GCP_PROJECT_ID', 'dtc-de-course-484520') }} | ||||||
|
||||||
| {{ env_var('GCP_PROJECT_ID', 'dtc-de-course-484520') }} | |
| {{ env_var('GCP_PROJECT_ID', 'please-add-your-gcp-project-id-here') }} |
Copilot
AI
Feb 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valuable information removed from column descriptions. The original descriptions provided specific details about codes and their meanings:
- vendorid: Removed "(1 = Creative Mobile Technologies, 2 = VeriFone Inc.) - Note: Raw data may contain nulls, filtered in staging"
- ratecodeid: Removed "(1=Standard, 2=JFK, 3=Newark, 4=Nassau/Westchester, 5=Negotiated, 6=Group)"
- payment_type: Removed "(1=Credit card, 2=Cash, 3=No charge, 4=Dispute, 5=Unknown, 6=Voided)"
- trip_type: Removed "(1=Street-hail, 2=Dispatch)"
- tip_amount: Removed "(credit card only)" clarification
While the schema.yml file retains these detailed descriptions, removing them from the source documentation reduces the usefulness of the sources.yml file as a reference. Consider keeping these details in sources.yml for better documentation of the raw data.
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,19 @@ | ||||||||||||||||||||
| with source as ( | ||||||||||||||||||||
| select * from {{ source('raw', 'fhv_tripdata') }} | ||||||||||||||||||||
| ), | ||||||||||||||||||||
|
|
||||||||||||||||||||
| renamed as ( | ||||||||||||||||||||
| select | ||||||||||||||||||||
| dispatching_base_num, | ||||||||||||||||||||
| cast(pickup_datetime as timestamp) as pickup_datetime, | ||||||||||||||||||||
| cast(dropOff_datetime as timestamp) as dropoff_datetime, | ||||||||||||||||||||
| cast(PUlocationID as integer) as pickup_location_id, | ||||||||||||||||||||
| cast(DOlocationID as integer) as dropoff_location_id, | ||||||||||||||||||||
| SR_Flag as sr_flag | ||||||||||||||||||||
| from source | ||||||||||||||||||||
| where dispatching_base_num is not null | ||||||||||||||||||||
| and pickup_datetime >= '2019-01-01' | ||||||||||||||||||||
| and pickup_datetime < '2020-01-01' | ||||||||||||||||||||
|
||||||||||||||||||||
| and pickup_datetime < '2020-01-01' | |
| and pickup_datetime < '2021-01-01' |
Copilot
AI
Feb 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing dev environment sampling: The stg_fhv_tripdata model lacks the dev environment sampling logic that is present in stg_green_tripdata and stg_yellow_tripdata. Both of those models have additional filtering when target.name == 'dev' to limit data to January 2019 only (pickup_datetime < '2019-02-01').
For consistency, consider adding after line 19:
{% if target.name == 'dev' %}
and pickup_datetime < '2019-02-01'
{% endif %}
This would ensure faster development iterations with smaller data volumes, consistent with the pattern established by other staging models.
| and pickup_datetime < '2020-01-01' | |
| and pickup_datetime < '2020-01-01' | |
| {% if target.name == 'dev' %} | |
| and pickup_datetime < '2019-02-01' | |
| {% endif %} |
Copilot
AI
Feb 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing model documentation: The stg_fhv_tripdata model lacks documentation in schema.yml. Both stg_green_tripdata and stg_yellow_tripdata have comprehensive documentation including model descriptions and column definitions with data tests in models/staging/schema.yml. The new stg_fhv_tripdata model should follow the same pattern to maintain consistency with the codebase conventions.
| select * from renamed | |
| select | |
| dispatching_base_num, | |
| pickup_datetime, | |
| dropoff_datetime, | |
| pickup_location_id, | |
| dropoff_location_id, | |
| sr_flag | |
| from renamed |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,8 +1,8 @@ | ||
| packages: | ||
| - name: dbt_utils | ||
| package: dbt-labs/dbt_utils | ||
| version: 1.3.3 | ||
| - name: codegen | ||
| package: dbt-labs/codegen | ||
| version: 0.14.0 | ||
| sha1_hash: 01f31e0d658d76121f50e62b998342ebf138df11 | ||
| - package: dbt-labs/codegen | ||
| name: codegen | ||
| version: 0.14.0 | ||
| - package: dbt-labs/dbt_utils | ||
| name: dbt_utils | ||
| version: 1.3.3 | ||
| sha1_hash: 41a9a95a7d1e8d4dff67de764f3b1b8e9094807c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary trailing whitespace added to the profiles.yml entry. This should be removed to maintain clean code.