Project Eagle Eye

Anomaly Detection for Fraudulent Credit Card Transactions

Project Goal: Design an anomaly detection system capable of automatically catching fraudulent transactions.

Dataset

Synthetic Financial Datasets For Fraud Detection Synthetic datasets generated by the PaySim mobile money simulator https://www.kaggle.com/ntnu-testimon/paysim1

step - Maps a unit of time in the real world. In this case 1 step is 1 hour of time.
type - CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER
amount - amount of the transaction in local currency
name - Origcustomer who started the transaction
oldbalance - Orginitial balance before the transaction
newbalance - Origcustomer's balance after the transaction.
nameDest - recipient ID of the transaction.
oldbalanceDest - initial recipient balance before the transaction.
newbalanceDest - recipient's balance after the transaction.
isFraud - identifies a fraudulent transaction (1) and non fraudulent (0)
isFlaggedFraud - flags illegal attempts to transfer more than 200.000 in a single transaction.

Other data that would be useful to have

Address of transactions
Credit limit of card
Salary
How often person travels

Modeling Strategy

Using an unsupervised model with the Random Cut forest algoroithm to identify anomalies in the credit card transactions.

When using Random Cut Forest, an anomaly score with low values indicates that the data point is considered “normal” whereas high values indicate the presence of an anomaly. The definitions of “low” and “high” depend on the application, but common practice suggests that scores beyond three standard deviations from the mean score are considered anomalous.

The RCF algorithm in Amazon SageMaker works by first obtaining a random sample of the training data. Each subsample is organized into a binary tree by randomly subdividing bounding boxes until each leaf represents a bounding box containing a single data point. The anomaly score assigned to an input data point is inversely proportional to its average depth across the forest.

Credit: https://aws.amazon.com/blogs/machine-learning/use-the-built-in-amazon-sagemaker-random-cut-forest-algorithm-for-anomaly-detection/

An supervised approach using XGBoost and a Linear Learner model. Hyperparameter tuning would be used in order to tune the model further. Identify / predict based on any input transaction whether it is classified as fraud or not fraud.

Data Cleanup

Fraudlent transactions are only happening on CASH_OUT and TRANSFER Consider skipping / dropping PAYMENT

Dropping these fields as they don't contain data that will help the model

nameOrig
nameDest
isFlaggedFraud might not be accurate - consider making a new column for >$200K

Overwriting the TYPE column with numeric values TRANSFER = 0 CASH_OUT = 1 PAYMENT = 2

End Goal

The goal is to predict fraudulent transactions.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Anomaly Detection for Fraudulent Credit Card Transactions.pdf		Anomaly Detection for Fraudulent Credit Card Transactions.pdf
EagleEye_FraudDetection.ipynb		EagleEye_FraudDetection.ipynb
RCF.ipynb		RCF.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Eagle Eye

Anomaly Detection for Fraudulent Credit Card Transactions

Dataset

Other data that would be useful to have

Modeling Strategy

Data Cleanup

End Goal

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

scothew/eagleEye

Folders and files

Latest commit

History

Repository files navigation

Project Eagle Eye

Anomaly Detection for Fraudulent Credit Card Transactions

Dataset

Other data that would be useful to have

Modeling Strategy

Data Cleanup

End Goal

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages