GitHub - DCajiao/workshop003_Machine_learning_and_Data_streaming

📋 Overview

This project builds an end-to-end pipeline for predicting happiness scores of countries using machine learning and real-time data streaming. It incorporates:

📊 Exploratory Data Analysis (EDA) for insights and transformations.
🧠 Machine Learning regression to predict happiness scores.
🚀 Kafka-based streaming for real-time processing.
🗄️ Database integration for storing predictions.
🛠️ API for on-demand predictions hosted on a scalable platform.

The pipeline is modular and automated, enabling seamless updates and experimentation with models.

🛠️ Tools and Technologies

Technology	Purpose
	Development and scripting
	Containerization of services
	Real-time data streaming
	Workflow orchestration and pipeline automation
	Database for storing predictions
	Machine learning model development
	Data visualization
	Advanced visualizations
	Model serialization
	Dependency management and packaging

🏗️ Project Nodes

🔍 More details on each phase and each node can be found in this document

📂 Repository Contents

api/
├── ... # All the Happiness Prediction API. 
│   ├── context/
│   │   ├── Workshop 3 Machine learning and Data streaming.pdf # Rubric of the project
│   ├── dags/
│   │   ├── ...  # All the files for Airflow Pipeline
│   ├── docs/
│   │   ├── api /
│   │   |   ├── Happiness Prediction API - Methods.postman_collection.json # API methods
│   │   ├── ...  # Copy of the documentation
│   ├── kafka/
│   │   ├── consumer / # Kafka config for the consumer container
│   │   ├── producer / # Kafka config for the producer container
│   ├── models/
│   │   ├── 00_happiness_score_prediction_model.pkl # Base model 
│   ├── notebooks/
│   │   ├── ... # Notebooks for all the proyect steps
│   ├── sql/
│   │   ├── queries /
│   │   ├── api /
│   ├── sql/
│   │   ├── connections /
│   │   ├── utils /
├── Dockerfile            # Docker configuration for containerizing the Airflow
├── docker-compose.yml    # docker-compose configuration
├── pyproject.yml         # Poetry Config
├── README.md                    # Documentation (you're reading it now!)

🚀 How to Run the Project

1️⃣ Clone the Repository

git clone https://github.com/DCajiao/workshop003_Machine_learning_and_Data_streaming.git
cd workshop003_Machine_learning_and_Data_streaming

2️⃣ Configure Environment Variables

Create a .env file in src/ and add the following:

DBNAME=...
DBUSER=...
DBPASS=...
DBHOST=...
DBPORT=5432

3️⃣ Start the Services

Run the following commands:

docker-compose up airflow-init
docker-compose up -d

4️⃣ Access Airflow to run the training pipeline

Go to http://localhost:8080.
Log in with:
- Username: airflow
- Password: airflow
Activate and trigger the DAG to execute the pipeline.

5️⃣ Access to consumer and producer container logs

Watch in real time how data is sent from the producer
Watch in real time how the consumer makes the request to the prediction API and inserts the features with the prediction into the database within the predicted_data table.

If you want to learn more about how this project works, you can find a more detailed analysis at the final report

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📋 Overview

🛠️ Tools and Technologies

🏗️ Project Nodes

📂 Repository Contents

🚀 How to Run the Project

1️⃣ Clone the Repository

2️⃣ Configure Environment Variables

3️⃣ Start the Services

4️⃣ Access Airflow to run the training pipeline

5️⃣ Access to consumer and producer container logs

🌟 Enjoy exploring the automated happiness prediction pipeline! 😊

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
api		api
context		context
dags		dags
docs		docs
kafka		kafka
models		models
notebooks		notebooks
sql		sql
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

License

DCajiao/workshop003_Machine_learning_and_Data_streaming

Folders and files

Latest commit

History

Repository files navigation

📋 Overview

🛠️ Tools and Technologies

🏗️ Project Nodes

📂 Repository Contents

🚀 How to Run the Project

1️⃣ Clone the Repository

2️⃣ Configure Environment Variables

3️⃣ Start the Services

4️⃣ Access Airflow to run the training pipeline

5️⃣ Access to consumer and producer container logs

🌟 Enjoy exploring the automated happiness prediction pipeline! 😊

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages