Skip to content

evgeniimatveev/mlops_project

Repository files navigation

MLOps Project

MLOps Tracking SQL CI/CD Status License

Overview

This repository provides an end-to-end MLOps pipeline for managing, tracking, and automating machine learning experiments.
The project integrates MLflow, Weights & Biases (W&B), SQL for experiment analysis, and CI/CD automation.

Project Structure

MLOps_Project/
├── data/                  # Raw and processed datasets
├── mlruns/                # MLflow tracking logs
├── models/                # Saved models
├── notebook/              # Jupyter notebooks for analysis
├── sql_queries/           # SQL scripts for MLflow experiments analysis
├── src/                   # Core source code
│   ├── clean_data/        # Data preprocessing scripts
│   ├── download_data/     # Data downloading scripts
│   ├── feature_engineering/ # Feature transformation scripts
│   ├── model_training/    # Model training scripts
│   ├── model_deployment/  # API for model deployment
│   ├── utils/             # Utility functions
├── sweeps/                # W&B sweep scripts for hyperparameter tuning
├── wandb/                 # Weights & Biases logs
├── config/                # Project configuration files
├── .github/workflows/     # CI/CD pipeline
├── .gitignore             # Ignore unnecessary files
├── environment.yaml       # Conda environment dependencies
├── remove_russian_comments.py  # Script to remove Russian comments from the code
├── requirements.txt       # Python dependencies
├── README.md              # Project documentation

Tech Stack

  • MLflow – Experiment tracking and model registry
  • Weights & Biases (W&B) – Logging and hyperparameter sweeps
  • PostgreSQL ️ – SQL for tracking and querying experiments
  • XGBoost – Machine learning model
  • Python – Main programming language
  • GitHub Actions ⚙️ – CI/CD automation

Setup & Installation

1️⃣ Clone the repository

git clone https://github.com/your-username/mlops_project.git
cd mlops_project

2️⃣ Create a virtual environment (Optional)

conda env create -f environment.yaml
conda activate mlops_env

OR

python -m venv venv
source venv/bin/activate  # On macOS/Linux
venv\Scripts\activate  # On Windows
pip install -r requirements.txt

3️⃣ Run the pipeline

Run data preprocessing

python src/clean_data/run.py

Run model training

python src/model_training/run.py

Run hyperparameter tuning with W&B

python sweeps/sweep.py

Start MLflow UI

mlflow ui --host 0.0.0.0 --port 5000

Then open http://localhost:5000 in your browser.


Future Plans

✅ MLflow & W&B integration
✅ SQL experiment analysis
✅ CI/CD with GitHub Actions


📜 License

This project is distributed under the MIT License. Feel free to use the code! 🚀


📢 Stay Connected!

💻 GitHub Repository: Evgenii Matveev
🌐 Portfolio: Data Science Portfolio
📌 LinkedIn: Evgenii Matveev


🔥 If you like this project, don't forget to star ⭐ the repository! 🔥

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages