Fix: Remove customer_id causing Cartesian product in monthly loan aggregations#10
Open
jjoyce0510 wants to merge 1 commit intomainfrom
Open
Fix: Remove customer_id causing Cartesian product in monthly loan aggregations#10jjoyce0510 wants to merge 1 commit intomainfrom
jjoyce0510 wants to merge 1 commit intomainfrom
Conversation
This fixes the data duplication bug introduced in PR #3 where adding customer_id caused a Cartesian product explosion. PROBLEM: - The LEFT JOIN to loans table only matched on loan_type_name - This created duplicates: each monthly aggregate row was multiplied by the number of customers with that loan type - Example: 1 row for "Mortgage - January" × 50 mortgage customers = 50 rows - Result: Dashboard showed inflated loan values (duplicated amounts) ROOT CAUSE: - customer_id doesn't belong in an aggregated monthly table - One month has multiple customers, so adding customer_id breaks aggregation - The join had no unique constraint (no date/id matching) FIX: - Removed customer_id field from SELECT - Removed LEFT JOIN to loans table - Restored original aggregation logic IMPACT: - Fixes Risk Analytics Reporting Dashboard - Resolves failing row count assertion (expected 10-30 rows) - Corrects loan origination amounts back to accurate values Related: PR #3, Issue reported by john.joyce@acryldata.io
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🐛 Bug Fix: Data Duplication in Risk Analytics Dashboard
This PR fixes the critical data quality issue causing inflated loan values in the Risk Analytics Reporting Dashboard.
Problem Summary
PR #3 (merged Jan 27, 2026) introduced a Cartesian product bug that duplicated aggregated loan data:
Root Cause
The bug was in this join logic added by PR #3:
What happened:
Why
customer_iddoesn't belong here:customer_idbreaks the aggregation graincustomer_idper aggregated rowThe Fix
✅ Removed
customer_idfield✅ Removed
LEFT JOIN loanscausing duplication✅ Restored original aggregation logic (pre-PR #3)
What Changed
combined as ( select coalesce(orig.month_start, pay.month_start) as month, orig.loan_type_name, - loans.customer_id, ⬅️ REMOVED coalesce(orig.loans_originated, 0) as new_loans, coalesce(orig.total_amount_originated, 0) as amount_originated, ... from monthly_originations orig full outer join monthly_payments pay on orig.month_start = pay.month_start - left join loans ⬅️ REMOVED - on orig.loan_type_name = loans.loan_type_name ⬅️ REMOVED )Impact
✅ Fixes: Risk Analytics Reporting Dashboard will show correct loan values
✅ Fixes: Row count assertion will pass (returns to expected 10-30 rows)
✅ Fixes: Accurate monthly loan aggregations restored
Testing Recommendations
After merge:
dbt run --select agg_monthly_loansRelated