MLOps for Clinical Trials — IMT Atlantique Workshop Project

Ateliers MLOps - IMT A -- Intersemestre 2025

L'idée des ateliers est de commencer à utiliser certains outils qui peuvent être d'utilité pour faire du MLOps.

--------ENGLISH VERSION-----------

MLOps for Clinical Trials — IMT Atlantique Workshop Project

This project was developed as part of the MLOps workshops held at IMT Atlantique during the Intersemester 2025. The goal is to explore the integration of modern MLOps tools for managing machine learning workflows.

DEMO

Lifecycle of an MLOps project

Example visualization of a hyperparameter grid search

Project Objectives

The project aims to design a complete MLOps pipeline for predicting drug approval outcomes based on real-world clinical trials data.
The pipeline ensures:

Reproducibility of ML experiments (via MLflow)
Workflow orchestration (via Apache Airflow)
Containerization and environment consistency (via Docker)
Automation and scalability of training & deployment

Tech Stack

Python: Data manipulation, preprocessing, modeling
Jupyter Notebooks: Prototyping & experimentation
MLflow: Experiment tracking, model management
Apache Airflow: DAG-based workflow orchestration
Docker: Containerized environment
Miniconda: Dependency and environment management
Git & GitHub: Version control and collaboration

Porject Structure

mlops-project-main/ │ ├── airflow/ # Airflow DAGs, plugins, logs, and configuration │ ├── dags/ # DAG files for orchestration │ ├── logs/ # Airflow execution logs (excluded in .gitignore) │ ├── plugins/ # Custom Airflow plugins │ └── config/ # Airflow configuration files │ ├── notebooks/ # Jupyter Notebooks for experimentation and training │ ├── data/ # Raw data used for modeling │ ├── cert/ # Certificate-related files │ └── *.ipynb # Notebooks including preprocessing and ML pipelines │ ├── docker-compose.yml # Docker setup for MLflow, Airflow, etc. ├── .gitignore # Ignore unnecessary files (envs, logs, etc.) ├── LICENSE # Project license ├── README.md # Project documentation └── TPS/ # Additional resources

installation and setups

Clone the repository

git clone https://github.com/AchrafEssaleh/mlops-project-main.git cd mlops-project-main

Install Miniconda (if not already installed)

bash Miniconda3-latest-MacOSX-arm64.sh

Restart your terminal, then create and activate the conda environment

conda create --name mlops_env python=3.11 conda activate mlops_env

Install required Python packages

pip install -r notebooks/requirements.txt

Launch all services using Docker Compose

docker-compose up --build

ML Pipeline Overview

Data Ingestion:
- Load raw clinical trial data from the TDC dataset
Data Preprocessing:
- Handle missing values, encoding, and normalization
- Feature engineering based on domain knowledge
Model Training:
- Train a Random Forest Classifier
- Evaluate using cross-validation and metrics
Experiment Tracking:
- Use MLflow to log parameters, metrics, and models
Workflow Orchestration:
- Schedule and manage pipeline with Apache Airflow DAGs
Deployment & Reproducibility:
- Use Docker & MLflow for reproducible and containerized runs

Experiment Tracking with MLflow

The ML pipeline uses MLflow to track and log the following:

Model parameters (e.g., number of estimators, depth)
Evaluation metrics (e.g., accuracy, precision, recall)
Model artifacts (e.g., trained models, plots)
Training runtime and environment metadata

To visualize experiments and runs locally, launch the MLflow tracking UI:

mlflow ui

Then open your browser and navigate to: http://localhost:5000

Team Contributors:

Authors & Contributions

This project was developed as part of the MLOps workshops at IMT Atlantique in January 2025, by a team of six members.

The workload was collaboratively split, with particular attention dedicated to:

Designing and implementing core data pipelines
Developing machine learning models and tuning hyperparameters
Integrating MLflow for robust experiment tracking and reproducibility
Structuring the project architecture and documentation
Supporting the setup of containerized environments and workflow orchestration

Special care was taken to ensure that the foundational components — data processing, modeling logic, and MLOps integration — were well designed and extensible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ateliers MLOps - IMT A -- Intersemestre 2025

MLOps for Clinical Trials — IMT Atlantique Workshop Project

DEMO

Project Objectives

Tech Stack

Porject Structure

installation and setups

Clone the repository

Install Miniconda (if not already installed)

Restart your terminal, then create and activate the conda environment

Install required Python packages

Launch all services using Docker Compose

ML Pipeline Overview

Experiment Tracking with MLflow

Authors & Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.circleci		.circleci
TPS		TPS
airflow		airflow
images		images
notebooks		notebooks
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

Ateliers MLOps - IMT A -- Intersemestre 2025

MLOps for Clinical Trials — IMT Atlantique Workshop Project

DEMO

Project Objectives

Tech Stack

Porject Structure

installation and setups

Clone the repository

Install Miniconda (if not already installed)

Restart your terminal, then create and activate the conda environment

Install required Python packages

Launch all services using Docker Compose

ML Pipeline Overview

Experiment Tracking with MLflow

Authors & Contributions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages