Skip to content

scothew/eagleEye

Repository files navigation

Project Eagle Eye

Anomaly Detection for Fraudulent Credit Card Transactions

Project Goal: Design an anomaly detection system capable of automatically catching fraudulent transactions.

Dataset

Synthetic Financial Datasets For Fraud Detection Synthetic datasets generated by the PaySim mobile money simulator https://www.kaggle.com/ntnu-testimon/paysim1

  • step - Maps a unit of time in the real world. In this case 1 step is 1 hour of time.
  • type - CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER
  • amount - amount of the transaction in local currency
  • name - Origcustomer who started the transaction
  • oldbalance - Orginitial balance before the transaction
  • newbalance - Origcustomer's balance after the transaction.
  • nameDest - recipient ID of the transaction.
  • oldbalanceDest - initial recipient balance before the transaction.
  • newbalanceDest - recipient's balance after the transaction.
  • isFraud - identifies a fraudulent transaction (1) and non fraudulent (0)
  • isFlaggedFraud - flags illegal attempts to transfer more than 200.000 in a single transaction.

Other data that would be useful to have

  • Address of transactions
  • Credit limit of card
  • Salary
  • How often person travels

Modeling Strategy

Using an unsupervised model with the Random Cut forest algoroithm to identify anomalies in the credit card transactions.

When using Random Cut Forest, an anomaly score with low values indicates that the data point is considered “normal” whereas high values indicate the presence of an anomaly. The definitions of “low” and “high” depend on the application, but common practice suggests that scores beyond three standard deviations from the mean score are considered anomalous.

The RCF algorithm in Amazon SageMaker works by first obtaining a random sample of the training data. Each subsample is organized into a binary tree by randomly subdividing bounding boxes until each leaf represents a bounding box containing a single data point. The anomaly score assigned to an input data point is inversely proportional to its average depth across the forest.

Credit: https://aws.amazon.com/blogs/machine-learning/use-the-built-in-amazon-sagemaker-random-cut-forest-algorithm-for-anomaly-detection/

An supervised approach using XGBoost and a Linear Learner model. Hyperparameter tuning would be used in order to tune the model further. Identify / predict based on any input transaction whether it is classified as fraud or not fraud.

Data Cleanup

Fraudlent transactions are only happening on CASH_OUT and TRANSFER Consider skipping / dropping PAYMENT

Dropping these fields as they don't contain data that will help the model

  • nameOrig
  • nameDest
  • isFlaggedFraud might not be accurate - consider making a new column for >$200K

Overwriting the TYPE column with numeric values TRANSFER = 0 CASH_OUT = 1 PAYMENT = 2

End Goal

The goal is to predict fraudulent transactions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •