Fix: Remove cartesian join causing inflated loan values in Risk Analytics dashboard#12
Open
danmennell wants to merge 1 commit intomainfrom
Open
Fix: Remove cartesian join causing inflated loan values in Risk Analytics dashboard#12danmennell wants to merge 1 commit intomainfrom
danmennell wants to merge 1 commit intomainfrom
Conversation
…regations This commit fixes a critical bug introduced in PR #3 (commit f9d33f8) that was causing loan values to be inflated in the Risk Analytics Reporting dashboard. Problem: - The previous change added a left join to the loans table on only loan_type_name - This created a cartesian product where each monthly aggregation row was duplicated for every loan of that type - Example: 50 mortgages = 50x multiplication of monthly totals Root Cause: - Joining aggregated data (monthly_originations) back to detail-level data (loans) on a non-unique key (loan_type_name) creates many-to-many relationships Fix: - Removed the problematic left join to loans table - Removed customer_id field (cannot be added at monthly aggregation level) - Monthly aggregations now correctly show one row per month per loan type Impact: - Fixes inflated loan origination values in Risk Analytics dashboard - Resolves failing data quality assertions on agg_monthly_loans - Restores correct row counts (10-30 rows instead of hundreds) Note: If customer_id tracking is needed, it should be added as a separate detail-level report, not in monthly aggregations.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The Risk Analytics Reporting dashboard is showing inflated loan origination values this month due to a SQL bug introduced in PR #3.
Root Cause
Commit f9d33f8 added a
left jointo theloanstable using onlyloan_type_nameas the join key:This creates a cartesian product because:
monthly_originationshas aggregated data (1 row per month per loan type)loanshas detail-level data (many rows per loan type)Example: If there are 50 mortgages, each monthly mortgage row gets duplicated 50 times, multiplying totals by 50x.
Evidence
Solution
This PR removes:
left join loansstatementcustomer_idfield (cannot be correctly added at monthly aggregation level)The
monthly_originationsCTE already contains all necessary aggregated metrics. No join back to detail-level data is needed.Testing
After this fix:
Note
If customer-level tracking is needed, it should be implemented as a separate detail-level report, not added to monthly aggregations.
Fixes: Inflated loan values in Risk Analytics Reporting dashboard
Reverts: Problematic changes from PR #3 (commit f9d33f8)