This repository contains the code, SQL queries, and visualisation assets developed and submitted in April 2025 for the End-of-Course Assessment
Data Wrangling at the Singapore University of Social Sciences (SUSS).
The project focuses on processing, cleaning, transforming, and visualising survey data collected from mall visitors in 2024.
To clean, transform, and summarise mall visitor survey data, and provide a monthly summary of customer satisfaction indicators. The project tasks include:
- Importing and storing raw survey data into MySQL
- Cleaning inconsistent date formats
- Optimising data types for efficient storage
- Creating a summary table with monthly averages and derived KPIs
- Visualising the data for business interpretation
Language: Python, SQL (MySQL), R
Packages (R): DBI, RMySQL, ggplot2, dplyr, scales
Tools: RStudio, MySQL Workbench, Jupyter Notebook
- Python script to access the dataset and store it as a MySQL table in MySQL database
- MySQL statement to clean up inconsistent date format
In this code, the doi column values have been updated to consistently use YYYY-MM-DD format.
- Alter Table statement to optimise data type for efficient storage
For satis, confirm, ideal and handle, these four columns correspond to respondent’s rating on a scale of 1-10. Therefore, using tinyint is space efficient (1 byte) and UNSIGNED restrict it to a non-negative value with a range of 0 to 255.
For comp, it has an input value of either 0 or 1, hence, using tinyint is also justified for saving storage space.
For nocomp, it has an input value from 1 to 4, hence it is optimal to use tinyint as data type to achieve space efficiency.
- MySQL statement to create summary table
A new table was created containing 12 rows, one for each month of the year, and 7 columns. The mth column contains an integer representing the month of the year.
The satis, confirm, ideal, comp, handle columns contain the corresponding month average values for those variables.
- R code to read in summary table and store it as an R data frame
This R code read in the summary table and store it as an R data frame.
- R code for visualization
A single visual was created to make it easy for the mall manager to understand the information in the table.
Back to Project main page
