Skip to content

Fix cartesian product in agg_monthly_loans causing inflated metrics#8

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/fix-duplicate-rows-in-agg-monthly-loans
Draft

Fix cartesian product in agg_monthly_loans causing inflated metrics#8
Copilot wants to merge 2 commits intomainfrom
copilot/fix-duplicate-rows-in-agg-monthly-loans

Conversation

Copy link

Copilot AI commented Mar 2, 2026

agg_monthly_loans was producing 1,162 rows (expected 10–30) and inflated loan values due to a LEFT JOIN back to the raw loans table on only loan_type_name. Since monthly_originations is already aggregated at month + loan_type grain, this join multiplied every aggregated row by every individual loan of that type.

Changes

  • models/marts/agg_monthly_loans.sql: Remove the LEFT JOIN loans ON orig.loan_type_name = loans.loan_type_name from the combined CTE and drop the loans.customer_id column from the SELECT
-- Before (combined CTE)
from monthly_originations orig
full outer join monthly_payments pay
    on orig.month_start = pay.month_start
left join loans                                     -- ❌ cartesian product
    on orig.loan_type_name = loans.loan_type_name

-- After
from monthly_originations orig
full outer join monthly_payments pay
    on orig.month_start = pay.month_start

The loans CTE is retained — it's still used by monthly_originations for the initial aggregation.

Original prompt

The agg_monthly_loans model has a SQL bug that causes duplicate rows and inflated metrics in the Risk Analytics Reporting Dashboard.

Problem:
The model performs a LEFT JOIN loans ON orig.loan_type_name = loans.loan_type_name which creates a cartesian product. Since monthly_originations is already aggregated by month + loan_type, this join matches each aggregated row to EVERY individual loan of that type, creating duplicates.

Impact:

  • Data quality assertion failing: Expected 10-30 rows, actual 1,162 rows
  • February 2026 has 81 duplicate rows instead of ~5-10
  • Loan values artificially inflated (e.g., $339M for Feb 2026)
  • Risk Analytics Reporting Dashboard showing incorrect, inflated loan origination values

Root Cause:
PR #3 added this join to include customer_id, but the join logic is incorrect for an aggregated table.

Required Fix:
Remove the problematic LEFT JOIN to the loans table in models/marts/agg_monthly_loans.sql. The aggregation should remain at the month + loan_type grain without attempting to join back to individual loan records.

If customer-level detail is truly needed, the model needs to be redesigned to aggregate at the customer + month + loan_type grain, but based on the current model structure and data quality assertion (expecting 10-30 rows), the intended grain is monthly aggregation only.

Files to modify:

  • models/marts/agg_monthly_loans.sql: Remove the LEFT JOIN loans and the customer_id column from the final select

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Co-authored-by: jjoyce0510 <17549204+jjoyce0510@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix SQL bug causing duplicate rows in agg_monthly_loans model Fix cartesian product in agg_monthly_loans causing inflated metrics Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants