Fraud Detection Project

Overview

This project demonstrates a fraud detection pipeline from raw data ingestion to model deployment and monitoring. It’s designed as an end-to-end MLOps project, featuring:

Automated data ingestion and processing (cleaning, handling class imbalance with SMOTE, data types, and feature engineering)
Model development with scikit-learn and MLflow tracking (both Logistic Regression and Random Forest classifiers are trained and hyperparameter-tuned to maximize recall, with all runs tracked in MLflow)
Model storage/versioning in the MLflow Model Registry
Containerized tasks orchestrated in Kubernetes via Airflow
CI/CD pipelines with GitHub Actions to build Docker images and deploy automatically
Real-time or batch scoring with a Streamlit UI

Key Features & Technologies

Frameworks & Libraries: pandas, scikit-learn, imbalanced-learn, MLflow
Orchestration: Apache Airflow (KubernetesPodOperator)
Data Storage: MinIO (S3-compatible object store)
Containerization: Docker + Kubernetes
CI/CD: GitHub Actions (build & push Docker images to GitHub Container Registry)
Dashboard: Streamlit for inference & analytics
Infrastructure as Code: Terraform

Project Details

Project Architecture

Project Structure

./
├── dags/
│   ├── deploy_model_dag.py     # Airflow DAG to deploy the trained model
│   └── etl_train_dag.py        # Airflow DAG to ingest and train the fraud detection model
├── docker/
│   ├── airflow/                # Dockerfile for custom Airflow
│   ├── data_ingestion/         # Dockerfile for data ingestion
│   ├── data_preparation/       # Dockerfile for data prep
│   ├── mlflow/                 # Dockerfile for custom MLflow
│   ├── streamlit/              # Dockerfile for Streamlit
│   └── train/                  # Dockerfile for model training
├── src/
│   ├── data_ingestion/         # Python scripts for ingestion
│   ├── data_preparation/       # Python scripts for data cleaning, feature engineering and processing
│   ├── train/                  # Model training scripts
│   └── ui/                     # Streamlit app & utility functions
├── terraform/                  # IaC with Terraform
└── .github/workflows/          # GitHub Actions to build Docker images and deploy automatically

MLOps Components

Kubernetes (Minikube)

A local Kubernetes cluster that:

Deploys containerized services (MinIO, MLflow, Airflow, Streamlit)
Runs containerized tasks in Airflow DAGs via the KubernetesPodOperator

Airflow

Manages the pipeline:
- etl_train_dag.py: Data ingestion → Data preparation → Model training → MLflow logging
- deploy_model_dag.py: Retrieves the best model from MLflow → Deploys it on Kubernetes

MinIO

An S3-compatible object store:

Raw Bucket: Stores original CSV data
Processed Bucket: Stores feature-engineered data, scalers, etc.

MLflow

Handles model development and tracking:

Trains both Logistic Regression and Random Forest classifiers, tuned for recall
Tracks all runs in MLflow
Automatically registers the best model in the Model Registry for deployment

Key Features:

Experiment Tracking: Logs metrics, hyperparameters, artifacts
Model Registry: Version control for models

Streamlit

An interactive UI for:

Single-transaction fraud detection
Batch scoring with performance metrics and visualizations

CI/CD with GitHub Actions

Each Docker image (Airflow, data ingestion, data preparation, MLflow, training, and Streamlit) is automatically built and pushed to GitHub Container Registry when changes are detected in relevant directories. This ensures consistent, up-to-date containers in the Kubernetes cluster.

Local Setup & Deployment

Prerequisites

Minikube
Terraform
Docker (for local builds)
kubectl (to interact with Kubernetes)

1. Start Minikube

minikube start

2. Clone This Repository

git clone https://github.com/ViniciusMarchi/fraud-detector-mlops 
cd fraud-detector-mlops

3. Set Up Environment Variables (Optional)

You can customize credentials and configurations in terraform/variables.tf.

4. Initialize and Apply Terraform

cd terraform
terraform init
terraform apply -auto-approve

This will:

Create namespaces
Deploy all services (MinIO, MLflow, Airflow, PostgreSQL, Streamlit, Grafana, Prometheus)
Expose them via NodePorts

5. Verify Deployments

kubectl get pods -A

6. Retrieve Minikube IP

minikube ip

7. Access Services

Service	URL	NodePort	Notes
Airflow	`http://<MINIKUBE_IP>:31000`	31000	Credentials in `values.yaml`
MLflow	`http://<MINIKUBE_IP>:30080`	30080	MLflow UI + Artifacts
MinIO (UI)	`http://<MINIKUBE_IP>:30091`	30091	MinIO console
Streamlit	`http://<MINIKUBE_IP>:30007`	30007	Fraud detection app

💡 You can change NodePorts in Terraform if there are conflicts.

Usage

Running Airflow DAGs

Go to Airflow: http://<MINIKUBE_IP>:31000
Enable and trigger:
- etl_train_dag to run ingestion, preparation, and training (logs to MLflow)
- deploy_model_dag to deploy the best model

Using the Streamlit App

Go to: http://<MINIKUBE_IP>:30007

Single Inference: Enter transaction details for a real-time fraud prediction
Batch Inference: Upload datasets (e.g. fraudTest.csv from Credit Card Transactions Fraud Detection Dataset) and view results (charts, metrics, confusion matrix)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
assets		assets
dags		dags
docker		docker
src		src
terraform		terraform
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fraud Detection Project

Table of Contents

Overview

Key Features & Technologies

Project Details

Project Architecture

Project Structure

MLOps Components

Kubernetes (Minikube)

Airflow

MinIO

MLflow

Streamlit

CI/CD with GitHub Actions

Local Setup & Deployment

Prerequisites

1. Start Minikube

2. Clone This Repository

3. Set Up Environment Variables (Optional)

4. Initialize and Apply Terraform

5. Verify Deployments

6. Retrieve Minikube IP

7. Access Services

Usage

Running Airflow DAGs

Using the Streamlit App

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Languages

License

ViniciusMarchi/fraud-detector-mlops

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection Project

Table of Contents

Overview

Key Features & Technologies

Project Details

Project Architecture

Project Structure

MLOps Components

Kubernetes (Minikube)

Airflow

MinIO

MLflow

Streamlit

CI/CD with GitHub Actions

Local Setup & Deployment

Prerequisites

1. Start Minikube

2. Clone This Repository

3. Set Up Environment Variables (Optional)

4. Initialize and Apply Terraform

5. Verify Deployments

6. Retrieve Minikube IP

7. Access Services

Usage

Running Airflow DAGs

Using the Streamlit App

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Languages

Packages