In this kaggle project , we are presented with an imbalanced data and are asked to build a classification model to mark fraudulent transactions.
Table of Content Introduction
Preliminary Examination
1. Performance Metric
Receiver Operating Characteristic (ROC)
2. Resampling Dataset
3. Synthetic Samples
4. Cross Validation
5. Customized Models
1. Logistic Regression
2. Decision Tree
3. Random Forest
4. Support Vector Machine
Exploratory Analysis
Resampling
Resampling summary
Modeling
1. Logistic Regression
2. Decision Tree
3. Random Forest
4. Support Vector Machine
Best Performing Models
Parameter Tuning of Random Forest
Conclusion
KeyWords: matplotlib, pandas, sklearn, data science, classification, resampling