Fraud Detection in E-Commerce (SQL)

Background

Fraud detection is critical to the financial security of both banks and customers. Data-driven fraud detection reveals fraud patterns that could not be easily discerned with eyeball checking and models such patterns to prevent further losses.

Datasets

I'm using the Fraud Detection in E-Commerce Dataset from Kaggle to demonstrate normalisation, relational modelling, and SQL analyses (including CTE, subqueries, aggregate functions, and window functions.) This happens at the early stage of data modelling where the analyst queries the banking database to extract data.

The above link includes three datasets: Dataset1, Dataset2, and Merged Dataset. I'm only using Dataset1 in this SQL project. The original dataset has 1,472,952 rows, 16 columns. Note that this dataset is synthetic and chosen because descriptive variables are available for SQL demonstration and interpretation.

Components

Python for normalisation: The normalisation.ipynb notebook includes the Python scripts for normalising the selected Kaggle dataset for relational modelling. Steps include exploring the data, subsetting the data, cleaning the data, and exporting database tables. Documentation is available in the notebook for illustrating the thought process.
Database creation SQL: The database.sql SQL file includes PostgreSQL queries for creating the relational database consisting of two tables, customers and transactions. These tables share a one-to-many relationship: One customer can make multiple transactions; one transaction should belong to only one customer.
SQL analyses and extration: The research.sql SQL file includes all the SQL queries performed in this database, covering CTE, subqueries, aggregate functions, and window functions for answering fraud detection research questions.

Several data samples are available in this repo: 1) dataset_five.csv: This is the original dataset, namely Dataset1 from the Kaggle Fraud Detection in E-Commerce Dataset page. 2) customers_five.csv: This is the customers table data produced from the Python script. 3) transactions_five.csv: This is the transactions table data produced from the Python script. Given GitHub storage limitation, only the first five rows are provided for reference.

Pending Items

I will add an entity-relationship diagram to illustrate the schema of the relational database. I will also add a Tableau dashboard and Python report using this same dataset.

Keywords

Fraud Detection, PostgreSQL, Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fraud Detection in E-Commerce (SQL)

Background

Datasets

Components

Pending Items

Keywords

Related Contents

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
README.md		README.md
customers_five.csv		customers_five.csv
database.sql		database.sql
dataset1_five.csv		dataset1_five.csv
normalisation.ipynb		normalisation.ipynb
research.sql		research.sql
transactions_five.csv		transactions_five.csv

mrlizhaozhi/fraud-detection-ecommerce-sql

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection in E-Commerce (SQL)

Background

Datasets

Components

Pending Items

Keywords

Related Contents

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages