- Apoorva Paranthaman (apoorvap@abc.edu)
- Maryam Shahbaz Ali (maryamshahba.a@abc.edu)
- Zahra Sultani (zahrasultai@abc.edu)
- Model Date: May 2025
- Model Version: 1.0
- License: Apache License
The remediated model, based on data from the Home Mortgage Disclosure Act (HMDA), is designed solely for educational purposes. It tests for biases between men and women, as well as races such as black vs. white and other demographic categories in the dataset. The primary goal of this model is to serve as a learning tool for students to understand and practice identifying and addressing biases in data. This model is not built to resolve a real-world problem. However, by engaging in such practices in a similar real-world project, a remediated model can help the project achieve some of the following potential values:
- Market-Leading Status: Enhances reputation through fair lending practices.
- Obligations to Lenders/Creditors: Ensures compliance with fair lending laws.
- Shareholder Returns: Reduces legal risks and improves customer satisfaction.
- Profitability: Identifies profitable, unbiased lending opportunities.
- Sustainability: Builds trust and loyalty among diverse customers.
- Growth Prospects: Opens new market segments through fair criteria.
The best remediated model is used for assessing bias and can be used for the following purposes:
- Intended Scope: Use for assessing and mitigating biases in mortgage lending.
- Capabilities and Limitations: Understand its ability to detect and correct biases.
- Appropriate Application: Apply consistently across all demographic groups.
- Monitoring and Validation: Regularly check for and adjust new biases.
- Regulatory Compliance: Adhere to fair lending laws (ECOA, FHA).
- Stakeholder Communication: Ensure transparency with regulators, lenders, and customers.
- Students in GWU DNSC_6330 class
- This model is an educational example and cannot be used for any additional purposes.
- Source of Training Data: The training data is from the Home Mortgage Disclosure Act (HDMA) and was downloaded from the class’s repository: GWU_rml Data
- Training and Validation Split: 70% training and 30% test.
- Number of Rows in Training: 112,253
| Column Name | Type | Measurement Level | Description |
|---|---|---|---|
| High_pried | Target | Int | Target variable indicating whether the mortgage loan is high-priced |
| term_360 | Input | Int | Indicates whether the mortgage term is 360 months (1) or other types (0) |
| conforming | Input | Int | Indicates whether the mortgage conforms to normal standards (1) or not (0) |
| debt_to_income_ratio_missing | Input | Int | Indicates whether the debt-to-income ratio is missing |
| loan_amount_std | Input | Float | Standardized loan amount of the mortgage for each applicant |
| loan_to_value_ratio_std | Input | Float | Standardized loan-to-value ratio indicating the ratio of the mortgage size to the value of the property for each applicant |
| no_intro_rate_period_std | Input | Float | Standardized indicator for no introductory rate period |
| intro_rate_period_std | Input | Float | Standardized introductory rate period |
| property_value_std | Input | Float | Standardized property value |
| income_std | Input | Float | Standardized income for each applicant |
| debt_to_income_ratio_std | Input | Float | Standardized debt-to-income ratio |
| Column Name | Type | Measurement Level | Description |
|---|---|---|---|
| race | Target | Int | A combined race column created from individual race indicators (black, Asian, white, other) |
| male | demographic info | Int | whether a person identifies as male (1) or not male (0) |
| female | demographic info | Int | whether a person identifies as female (1) or not female (0) |
| High_priced | Target | Int | Target variable indicating whether the mortgage loan is high-priced |
- Source of Test Data: The test data is from the Home Mortgage Disclosure Act (HDMA) and was downloaded from the class’s repository: GWU_rml Data
- Number of Rows in Test Data: 48,085
- Difference in Columns Between Training and Test Data: The 'high_priced' column is excluded from test data.
- Columns Used as Inputs:
x_names = ['term_360', 'conforming', 'debt_to_income_ratio_missing', 'loan_amount_std', 'loan_to_value_ratio_std', 'no_intro_rate_period_std', 'intro_rate_period_std', 'property_value_std', 'income_std', 'debt_to_income_ratio_std'] - Columns Used as Targets:
y_name = 'high_priced' - Type of Model: Explainable Boosting Machine (EBM) model.
- Software Used: Python’s interpret library including 'xgboost', 'H2O', 'pd', and 'np'.
- Version of Modeling Software:
- Interpret (0.6.10)
- xgboost (2.1.4)
- h2o (2.0.2)
- pd (2.2.2)
- numpy (2.0.2)
# Best remediated params from assignment 3
rem_params = {
'max_bins': 256,
'max_interaction_bins': 16,
'interactions': 15,
'outer_bags': 8,
'inner_bags': 4,
'learning_rate': 0.05,
'validation_size': 0.1,
'min_samples_leaf': 10,
'max_leaves': 3,
'n_jobs': 4,
'early_stopping_rounds': 100,
'random_state': 10001
}| Compare vs. Control | AIR |
|---|---|
| Asian vs. White | 1.140 |
| Black vs. White | 0.855 |
| Females vs. Males | 0.948 |
Table 1. Validation AIR values for race and sex groups.
See the full EBM notebook with code on GitHub for details of quantitative analysis.
The metrics used are AUC (Area Under the Curve) and F1-score, installed from interpret.perf and sklearn.metrics packages.
Models were assessed primarily with AUC and AIR. See details below:
| Train AUC | Validation AUC | Test AUC |
|---|---|---|
| 0.7412 | 0.7437 | 0.8285* |
Table 2. AUC values across data partitions.
*The test AUC value is taken from model_eval_2025_03_27_14_06_12.csv.
Figure 1: Heatmap showing the correlation between different features, which helps in identifying relationships and dependencies among the features. See full notebook here.
Figure 2: Best EBM Partial Dependence for top 4 features; 'loan_to_value_ratio_std', 'property_value_std', 'debt_to_income_ratio_std', and 'intro_rate_period_std', using the Explainable Boosting Machine model, showing the relationship between those features and the model's predictions. See full notebook here.
Figure 3: The global feature importance for the EBM model, showing the significance of features, which helps in understanding which features are most influential in the model's predictions. Based on this plot, 'loan_to_value_ratio_std', 'property_value_std', 'debt_to_income_ratio_std' have the highest importance for the model. See full notebook here.
Figure 4: The stolen decision tree model extracts stolen tree as an adversarial example attack, which allowed identifying vulnerabilities in the model's decision-making process. The stolen model was used to find the low and high adversary seed rows for adversary searches. See the full notebook here.
Figure 5: Variable Importance for H2O Distributed Random Forest highlighting the importance of features for the stolen model. See the full notebook here.
Figure 6: The Residuals analysis shows if the model struggles to predict when customers will receive a high-priced loan correctly. It does much better when predicting customers will NOT receive a high-priced loan. There are also some very noticeable outliers. Full notebook here.
Figure 7: The individual conditional expectation (ICE) curves for Feature Index 0. Each line represents the partial dependence of the model's prediction on the feature value for a single instance. See full notebook here.
Figure 8: Some other models include GLM and Monotonic XGBoost model and the local feature importance has been compared across models in 10th, 50th, and 90th Percentile. See full notebook here.
- Biases: Even with remediation, this model can still exhibit biases such as bias in algorithms or training data. Models might perform well on training data but fail to generalize unseen data, leading to inaccurate predictions in real-world.
- Model Drift: Over time, the model’s performance can drift due to changes in the data distribution, necessitating monitoring updates.
- Unfair Decisions: The model’s decisions could lead to unfair loan approvals and affect individuals resulting in discrimination in real-world situations.
- Uncertainty in Model Performance: Despite remediation, there is always uncertainty regarding how well the model will perform in diverse real-world scenarios. Unexpected events such as recession and outbreaks affect model performance.
- Software Bugs: Unforeseen bugs or vulnerabilities in software implementation can lead to incorrect model outputs or security risks.
- Legal Complications: The model might behave unpredictably when exposed to new types of data or adversarial examples, leading to unintended consequences such as legal penalties.
- During the training of the model to predict high-priced loans, several unexpected results were observed: the residuals were highly unbalanced, with higher residuals for high-priced loans and lower residuals for non-high-priced loans, indicating the model struggled to accurately predict high-priced loans while performing better for non-high-priced loans. Additionally, there were noticeable outliers, particularly for high-priced loans, suggesting significant prediction errors in certain cases. These findings highlight the need for better model calibration, improved feature engineering, and thorough outlier analysis to enhance the model's accuracy and reliability.
- get_max_f1_frame: This function calculates the optimal cutoff for the model based on the F1 score.
- get_confusion_matrix: This function generates confusion matrices for different demographic groups.
- air: This function calculates the Adverse Impact Ratio (AIR) for different demographic groups.










