LAPES Data Challenge – Predictive Analytics System

This repository contains the complete end-to-end solution developed for the LAPES Predictive Data Challenge. The objective of this project is to extract real business value from raw data through advanced data processing, machine learning, and deep learning models.

Key Features

Automated ELT pipeline using Docker and CI/CD (GitHub Actions)
Medallion Data Lake Architecture (Bronze → Silver → Gold → Diamond)
Exploratory Data Analysis (EDA) and statistical insights
Interactive dashboards built with Plotly and Streamlit
Supervised and unsupervised machine learning models
Deep learning models implemented with PyTorch and/or Keras
Automated PDF/HTML reports with visual storytelling
Fully documented and reproducible environment

Dataset Used

Credit Card Fraud Detection – Kaggle

Why this dataset?

High complexity and real-world relevance
Applicable to fraud detection in financial and e-commerce sectors
Enables exploration of imbalanced classification problems
Supports advanced ML/DL modeling and visualization
Ideal for developing anomaly detection techniques and dashboards

Tech Stack

Category	Tools & Libraries
Language	Python 3.11+
Data Processing	Pandas, Polars
Visualization	Matplotlib, Seaborn, Plotly
Machine Learning	Scikit-learn
Deep Learning	PyTorch, Keras
Dashboards	Streamlit
Data Storage	PostgreSQL
Automation	Docker, GitHub Actions
Deployment	FastAPI (optional), Docker Compose
Big Data (optional)	Spark, Dask, Kafka, Hadoop Ecosystem

📁 Project Structure

├── data/ # Raw and processed data (Bronze → Silver → Gold)
├── notebooks/ # Jupyter notebooks for EDA, ML, and DL
├── src/ # Source code for ELT, preprocessing, and modeling
├── sql/
│ ├── DDL/ # Database schema definitions
│ └── DML/ # Data manipulation scripts
├── app/ # Streamlit dashboard application
├── requirements.txt # List of project dependencies
└── README.md # Project documentation (you are here)

Pipeline Overview

1. Data Ingestion and ELT

Automated pipeline using Docker and GitHub Actions
Raw data stored in Bronze, transformations handled via SQL triggers
Access policies applied at each layer

2. Exploratory Data Analysis (EDA)

Descriptive statistics, correlation matrices, missing value analysis
Anomaly detection and data profiling

3. Machine Learning

Supervised learning: logistic regression, random forests, XGBoost
Cross-validation and metric tracking (accuracy, recall, F1-score, ROC-AUC)

4. Deep Learning

Neural networks for fraud detection (imbalanced classification)
Training with class balancing techniques (e.g., SMOTE)
Model evaluation with robust metrics

5. Visualization and Reporting

Static plots with matplotlib/seaborn
Interactive dashboards with Streamlit and Plotly

Reproducibility & Setup

Installation

Clone the repository and install dependencies in a virtual environment:

git clone https://github.com/your-username/lapes-predictive-analytics.git cd lapes-predictive-analytics

python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate pip install -r requirements.txt

# Linux/macOS
python -m venv .venv
source .venv/bin/activate

# Windows
python -m venv .venv
.venv\Scripts\activate

# Install the requiriments for the Project
pip install -r requiriments.txt

Configure the Database

Install PostgreSQL if not already installed.

Create a database named lapes with:

User: postgres

Password: postgres

Ensure the PostgreSQL service is running.

If you’re using different credentials, update them in the project accordingly.

Entry Point for Execution

You can run the entire pipeline locally using the main.py script.

This script implements the following steps:

Locate the raw CSV dataset (creditcard.csv) in one of the following locations:
- Project root
- data/bronze/
- If not found, the script exits with an error.
- Ensure the data/bronze folder exists and copy/move the CSV there if needed.
Run the ELT pipeline in sequence:
- Bronze → Silver
- Silver → Gold
- Persist trained ML/DL models into the database
- Apply all SQL scripts (DDL and DML) including:
- Table creation
- Grant permissions
- Triggers

Launch the Streamlit dashboard at app/dashboard.py.

To execute:

python main.py

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
.github/workflows		.github/workflows
.idea		.idea
app		app
notebooks		notebooks
sql		sql
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LAPES Data Challenge – Predictive Analytics System

Key Features

Dataset Used

Why this dataset?

Tech Stack

📁 Project Structure

Pipeline Overview

1. Data Ingestion and ELT

2. Exploratory Data Analysis (EDA)

3. Machine Learning

4. Deep Learning

5. Visualization and Reporting

Reproducibility & Setup

Installation

Configure the Database

Entry Point for Execution

Launch the Streamlit dashboard at app/dashboard.py.

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

GabrielEliDias/LAPES-Data-Challenge-Predictive-Analytics-System

Folders and files

Latest commit

History

Repository files navigation

LAPES Data Challenge – Predictive Analytics System

Key Features

Dataset Used

Why this dataset?

Tech Stack

📁 Project Structure

Pipeline Overview

1. Data Ingestion and ELT

2. Exploratory Data Analysis (EDA)

3. Machine Learning

4. Deep Learning

5. Visualization and Reporting

Reproducibility & Setup

Installation

Configure the Database

Entry Point for Execution

Launch the Streamlit dashboard at app/dashboard.py.

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages