Skip to content

Fix: Remove cartesian join causing inflated loan values in Risk Analytics dashboard#12

Open
danmennell wants to merge 1 commit intomainfrom
fix/remove-cartesian-join-agg-monthly-loans
Open

Fix: Remove cartesian join causing inflated loan values in Risk Analytics dashboard#12
danmennell wants to merge 1 commit intomainfrom
fix/remove-cartesian-join-agg-monthly-loans

Conversation

@danmennell
Copy link

Problem

The Risk Analytics Reporting dashboard is showing inflated loan origination values this month due to a SQL bug introduced in PR #3.

Root Cause

Commit f9d33f8 added a left join to the loans table using only loan_type_name as the join key:

left join loans
    on orig.loan_type_name = loans.loan_type_name

This creates a cartesian product because:

  • monthly_originations has aggregated data (1 row per month per loan type)
  • loans has detail-level data (many rows per loan type)
  • Joining on a non-unique key creates many-to-many relationships

Example: If there are 50 mortgages, each monthly mortgage row gets duplicated 50 times, multiplying totals by 50x.

Evidence

  • Data Quality Failures: 3 assertions failing on the data pipeline
  • Row Count Explosion: Expected 10-30 rows, getting hundreds
  • Dashboard Impact: Loan values way higher than previous months

Solution

This PR removes:

  • The problematic left join loans statement
  • The customer_id field (cannot be correctly added at monthly aggregation level)

The monthly_originations CTE already contains all necessary aggregated metrics. No join back to detail-level data is needed.

Testing

After this fix:

  • ✅ Monthly aggregations will show correct totals (no multiplication)
  • ✅ Row counts will return to expected range (10-30 rows)
  • ✅ Data quality assertions will pass
  • ✅ Risk Analytics dashboard will show accurate loan values

Note

If customer-level tracking is needed, it should be implemented as a separate detail-level report, not added to monthly aggregations.


Fixes: Inflated loan values in Risk Analytics Reporting dashboard
Reverts: Problematic changes from PR #3 (commit f9d33f8)

…regations

This commit fixes a critical bug introduced in PR #3 (commit f9d33f8) that was
causing loan values to be inflated in the Risk Analytics Reporting dashboard.

Problem:
- The previous change added a left join to the loans table on only loan_type_name
- This created a cartesian product where each monthly aggregation row was
  duplicated for every loan of that type
- Example: 50 mortgages = 50x multiplication of monthly totals

Root Cause:
- Joining aggregated data (monthly_originations) back to detail-level data (loans)
  on a non-unique key (loan_type_name) creates many-to-many relationships

Fix:
- Removed the problematic left join to loans table
- Removed customer_id field (cannot be added at monthly aggregation level)
- Monthly aggregations now correctly show one row per month per loan type

Impact:
- Fixes inflated loan origination values in Risk Analytics dashboard
- Resolves failing data quality assertions on agg_monthly_loans
- Restores correct row counts (10-30 rows instead of hundreds)

Note: If customer_id tracking is needed, it should be added as a separate
detail-level report, not in monthly aggregations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant