This project focuses on detecting fraudulent credit card transactions using machine learning. It involves handling class imbalance, feature engineering, and training a Random Forest model to achieve high recall and precision. The final model achieves an F1-score of 0.86, effectively identifying fraud while minimizing false positives.
Credit card fraud is a significant issue in the financial industry, costing billions of dollars annually. This project aims to build a machine learning model that can accurately detect fraudulent transactions. The model is trained on a highly imbalanced dataset and optimized to maximize recall while maintaining reasonable precision.
The dataset used in this project is the Credit Card Fraud Detection Dataset from Kaggle. It contains anonymized credit card transactions, with features V1-V28
(PCA-transformed), Time
, Amount
, and the target variable Class
(1 for fraud, 0 for non-fraud).
-
Data Preprocessing:
- Handled missing values (none found).
- Scaled the
Amount
andTime
features. - Addressed class imbalance using SMOTE (Synthetic Minority Oversampling Technique).
-
Feature Engineering:
- Created new features like
Hour
,Time_Since_Last_Transaction
, andAmount_Scaled
. - Selected top features using Random Forest feature importance.
- Created new features like
-
Model Training:
- Trained a Random Forest model with hyperparameter tuning.
- Achieved an F1-score of 0.86 and a recall of 0.83.
-
Evaluation:
- Evaluated the model using precision, recall, F1-score, and ROC-AUC.
- Visualized results using a confusion matrix and feature importance plot.
- F1-Score: 0.86
- Recall: 0.83
- Precision: 0.89
- ROC-AUC: 0.98
To run this project locally, follow these steps:
- Clone the repository:
git clone https://github.com/your-username/credit-card-fraud-detection.git