Skip to content

Fix: Remove cartesian product from monthly loan aggregations#7

Open
jjoyce0510 wants to merge 1 commit intomainfrom
fix/remove-cartesian-join-from-monthly-agg
Open

Fix: Remove cartesian product from monthly loan aggregations#7
jjoyce0510 wants to merge 1 commit intomainfrom
fix/remove-cartesian-join-from-monthly-agg

Conversation

@jjoyce0510
Copy link

🐛 Bug Fix: Remove Cartesian Product from Monthly Aggregations

This PR fixes the critical SQL bug introduced in PR #3 that caused the agg_monthly_loans table to produce hundreds of rows instead of the expected 10-30 rows.


Problem Identified

The LEFT JOIN to the loans table was creating a cartesian product:

left join loans
    on orig.loan_type_name = loans.loan_type_name  -- ❌ Too broad!

Why this breaks:

  • monthly_originations contains aggregated data (1 row per month/loan_type)
  • loans (fct_loan_details) contains individual loan records (100s of rows)
  • Joining on loan_type_name alone causes each aggregated row to multiply by the number of loans of that type
  • Result: Table explodes from 10-30 rows to hundreds of duplicated rows

What This PR Does

Removes the problematic LEFT JOIN loans
Removes the customer_id column (doesn't make sense in monthly aggregations)
Restores table to correct grain: one row per month per loan type


Impact

  • Assertion Status: Will resolve 10 consecutive assertion failures (100% failure rate)
  • Data Quality: Restores accurate monthly aggregation counts
  • No Breaking Changes: Only removes invalid columns that shouldn't exist in this grain

Root Cause Analysis

The original intent in PR #3 was to "track which customers are associated with each month's loan activity," but this is conceptually incompatible with monthly aggregations:

  • A monthly aggregation represents many customers (e.g., "100 mortgages in January")
  • There's no single customer_id to associate with a month/loan_type combination
  • To track customer-level activity, you'd need a different model at customer-month grain

Testing

After this fix, the table should return to expected behavior:

  • ✅ Row count between 10-30 (monthly aggregations across loan types)
  • ✅ No duplicate month/loan_type combinations
  • ✅ Assertion row_count_between_10_and_30 will pass

Related Issues


Files Changed

  • models/marts/agg_monthly_loans.sql - Removed cartesian join and customer_id column

Ready for review! 🚀

Remove the problematic LEFT JOIN to loans table that was causing a
cartesian product. The join condition (loan_type_name) was too broad,
causing each monthly aggregation row to multiply by the number of
individual loans of that type.

Issue: The assertion expecting 10-30 rows was failing because the table
was producing hundreds of rows due to the fan-out from joining aggregated
data to granular loan records.

Root cause: customer_id doesn't belong in a monthly aggregation table
as each month represents multiple customers. The table grain should
remain one row per month per loan type.

Fixes the 10 consecutive assertion failures (100% failure rate).

Related to: PR #3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant