-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathmemo.txt
More file actions
134 lines (122 loc) · 6.64 KB
/
memo.txt
File metadata and controls
134 lines (122 loc) · 6.64 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
before encoding:
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 593994 non-null int64
1 annual_income 593994 non-null float64
2 debt_to_income_ratio 593994 non-null float64
3 credit_score 593994 non-null int64
4 loan_amount 593994 non-null float64
5 interest_rate 593994 non-null float64
6 gender 593994 non-null object
7 marital_status 593994 non-null object
8 education_level 593994 non-null object
9 employment_status 593994 non-null object
10 loan_purpose 593994 non-null object
11 grade_subgrade 593994 non-null object
12 loan_paid_back 593994 non-null float64
after encoding:
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 593994 non-null int64
1 annual_income 593994 non-null float64
2 debt_to_income_ratio 593994 non-null float64
3 credit_score 593994 non-null int64
4 loan_amount 593994 non-null float64
5 interest_rate 593994 non-null float64
6 loan_paid_back 593994 non-null float64
7 gender_Female 593994 non-null int64
8 gender_Male 593994 non-null int64
9 gender_Other 593994 non-null int64
10 marital_status_Divorced 593994 non-null int64
11 marital_status_Married 593994 non-null int64
12 marital_status_Single 593994 non-null int64
13 marital_status_Widowed 593994 non-null int64
14 loan_purpose_Business 593994 non-null int64
15 loan_purpose_Car 593994 non-null int64
16 loan_purpose_Debt consolidation 593994 non-null int64
17 loan_purpose_Education 593994 non-null int64
18 loan_purpose_Home 593994 non-null int64
19 loan_purpose_Medical 593994 non-null int64
20 loan_purpose_Other 593994 non-null int64
21 loan_purpose_Vacation 593994 non-null int64
22 employment_status_Employed 593994 non-null int64
23 employment_status_Retired 593994 non-null int64
24 employment_status_Self-employed 593994 non-null int64
25 employment_status_Student 593994 non-null int64
26 employment_status_Unemployed 593994 non-null int64
27 education_level_encoded 593994 non-null float64
28 grade_subgrade_encoded 593994 non-null float64
XGBoost Feature Importance:
Feature Importance
8 employment_status_Unemployed 5707.703809
7 employment_status_Student 635.046362
5 employment_status_Retired 135.272638
2 debt_to_income_ratio 82.340114
1 credit_score 49.838340
12 grade_subgrade_encoded 25.205935
18 loan_purpose_Education 6.401788
14 loan_amount 6.393689
19 loan_purpose_Home 6.166740
6 employment_status_Self-employed 5.679464
0 annual_income 5.644810
13 interest_rate 5.547760
20 loan_purpose_Medical 5.514242
3 education_level_encoded 5.343600
15 loan_purpose_Business 5.328979
11 gender_Other 5.320094
4 employment_status_Employed 5.151166
17 loan_purpose_Debt consolidation 5.097537
10 gender_Male 5.038043
21 loan_purpose_Other 4.976461
16 loan_purpose_Car 4.751211
23 marital_status_Divorced 4.698768
25 marital_status_Single 4.616338
26 marital_status_Widowed 4.508544
9 gender_Female 4.300694
22 loan_purpose_Vacation 4.291977
24 marital_status_Married 4.146536
start virtual workplace:
./.venv/Scripts/Activate.ps1
baseline:
LogisticRegression(max_iter=20)
5-Fold Cross-Validation AUC: 0.9086 ± 0.0008
RandomForestClassifier(n_estimators=20, random_state=42)
5-Fold Cross-Validation AUC: 0.8940 ± 0.0012
XGBClassifier(
use_label_encoder=False,
eval_metric='auc',
random_state=42
)
5-Fold Cross-Validation AUC: 0.9194 ± 0.0008
Neural Network
1. hyperparameter optimization without new features
Best Hyperparameters:
{'n_estimators': 809, 'max_depth': 7, 'learning_rate': 0.071823951579327, 'subsample': 0.8064003270882567, 'colsample_bytree': 0.7095187833222674, 'reg_alpha': 0.25269604526136613, 'reg_lambda': 0.7702697557795304}
Best AUC:
0.9206637613803655
2. add new feature
Index(['annual_income', 'debt_to_income_ratio', 'credit_score', 'loan_amount',
'interest_rate', 'loan_paid_back', 'gender_Female', 'gender_Male',
'gender_Other', 'marital_status_Divorced', 'marital_status_Married',
'marital_status_Single', 'marital_status_Widowed',
'loan_purpose_Business', 'loan_purpose_Car',
'loan_purpose_Debt consolidation', 'loan_purpose_Education',
'loan_purpose_Home', 'loan_purpose_Medical', 'loan_purpose_Other',
'loan_purpose_Vacation', 'employment_status_Employed',
'employment_status_Retired', 'employment_status_Self-employed',
'employment_status_Student', 'employment_status_Unemployed',
'education_level_encoded', 'grade_subgrade_encoded',
'income_to_loan_ratio', 'debt_to_income_ratio_log',
'interest_income_ratio', 'income_x_credit', 'loan_amount_x_interest',
'employment_marital', 'employment_unemployed_and_high_debt',
'loan_purpose_risk_group'],
dtype='object')
Best Hyperparameters:
{'n_estimators': 920, 'max_depth': 3, 'learning_rate': 0.1680098986712539, 'subsample': 0.5538714018402994, 'colsample_bytree': 0.6555168602120371, 'reg_alpha': 0.4537280355069443, 'reg_lambda': 0.7711506026153779}
Best AUC:
0.9196544034704046
3. add parameter 'scale_pos_weight' : 4
Best Hyperparameters:
{'n_estimators': 354, 'max_depth': 4, 'learning_rate': 0.18250402754912762, 'subsample': 0.9831300073615593, 'colsample_bytree': 0.689652842715077, 'reg_alpha': 0.2815848752242487, 'reg_lambda': 0.746971043703558}
Best AUC:
0.9200839484040052