Fix: Remove customer_id causing Cartesian join explosion in agg_monthly_loans#11
Open
jjoyce0510 wants to merge 1 commit intomainfrom
Open
Fix: Remove customer_id causing Cartesian join explosion in agg_monthly_loans#11jjoyce0510 wants to merge 1 commit intomainfrom
jjoyce0510 wants to merge 1 commit intomainfrom
Conversation
This commit fixes a critical data quality bug introduced in PR #3 where adding customer_id to the monthly aggregation table caused a Cartesian product explosion. **Problem:** - The LEFT JOIN to the loans table on loan_type_name created a many-to-many relationship, multiplying each monthly aggregate row by the number of individual loans of that type - This caused the table to grow from ~30 rows to thousands of rows - Each aggregated value (amount_originated, etc.) was repeated hundreds of times, making dashboard totals appear massively inflated - Data quality assertion (row count between 10-30) has been failing since the change was deployed **Root Cause:** customer_id is a loan-level field and cannot logically exist in a monthly aggregation without causing row multiplication. The original join condition (loan_type_name only) matched each monthly summary to EVERY loan of that type. **Solution:** - Remove customer_id field from the SELECT - Remove the LEFT JOIN to loans table - Return the model to its correct aggregation level (month + loan_type) **Impact:** - Table will return to expected ~30 rows - Dashboard values will be accurate again - Data quality assertions will pass Fixes issues caused by PR #3
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🐛 Bug Fix
This PR fixes a critical data quality issue introduced in PR #3 that has been causing inflated values in the Risk Analytics Dashboard.
🔍 Problem
PR #3 added a
customer_idfield to the monthly aggregation table by joining to theloanstable using onlyloan_type_nameas the join condition. This created a Cartesian join explosion:💥 Root Cause
This join matches each monthly summary row to EVERY individual loan of that type, causing:
amount_originatedvalue to be repeated hundreds of times✅ Solution
This PR reverts the problematic changes from PR #3:
customer_idfield from SELECTNote:
customer_idis a loan-level field and cannot logically exist in a monthly aggregation without causing row multiplication. If customer-level monthly reporting is needed, a separate model should be created with proper grouping by customer_id.📊 Expected Impact
After merging and rebuilding the table:
🔗 Related
urn:li:assertion:62352f3a-06c3-4802-8e59-c3206939fdd3loans-by-risk-level,loans-by-type✋ Review Checklist
dbt run --select agg_monthly_loans