Skip to content
View BastienDussap's full-sized avatar

Block or report BastienDussap

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
BastienDussap/README.md

ML Engineer / Data Scientist

LinkedIn

About Me

I'm a ML Engineer / Data Scientist at Metafora Biosystems, a biotechnology company based at the Cochin Hospital (Paris 14). I work on METAflow, a novel AI-powered tool for flow cytometry analysis.

What I do:

  • Develop machine learning algorithms and data processing pipelines in Python for cytometry analysis
  • Build and maintain production-ready REST APIs using Django framework
  • Design and deploy ML models on Google Cloud Platform (GCP), following ISO 62304 standards for medical device software
  • Collaborate directly with users to gather feedback and translate requirements into actionable development tickets
  • Participate in Agile/Scrum workflows, including sprint planning and backlog management

Former PhD student in Machine Learning / Statistics at Université Paris-Saclay, affiliated with the Institut de Mathématiques d'Orsay and part of the Datashape team at INRIA, under the supervision of Gilles Blanchard and Marc Glisse.


Technical Skills

Languages & Frameworks

Python Django Docker GCP C++

ML & Data Science

  • Libraries: NumPy, Pandas, Scikit-learn, PyTorch, Matplotlib
  • MLOps: Git, Docker basics, REST API development
  • Cloud: Google Cloud Platform (GCP) - Compute Engine, Cloud Storage
  • Methodologies: Agile/Scrum, ISO 62304 (medical device software)

Tools & Platforms

  • Version Control: Git
  • Documentation: LaTeX, Zotero, Markdown
  • OS: Linux, Windows (WSL)
  • AI tools: Claude, Copilot

Currently Learning

  • AWS Machine Learning Engineer certification AWS
  • Docker & Kubernetes for ML deployment
  • Advanced MLOps patterns

PhD Research

Thesis: A Unified Framework for Label Shift Quantification

My doctoral research focused on quantification learning applied to cytometric datasets, particularly in the context of Metafora's METAflow software.

Key contributions:

  • Developed methods for automatic analysis of flow cytometry data using machine learning
  • Leveraged Reproducing Kernel Hilbert Spaces (RKHS) to embed and store high-dimensional features
  • Created transfer learning techniques to analyze new samples based on previously analyzed ones
  • Built frameworks to estimate population proportions in new samples

Read my thesis


Publications & Recognition

Best Paper Award

"Label Shift Quantification with Robust Guarantees via Distribution Feature Matching"
with G. Blanchard and B.-E. Chérief-Abdellatif

Abstract: We propose a unified framework based on distribution feature matching that recovers estimators from both classification-based and statistical mixture modeling approaches to quantification learning. We provide robust theoretical guarantees under label shift and investigate misspecification scenarios.


Research Interests

  • Machine Learning: Kernel methods, transfer learning, statistical learning theory
  • Label Shift & Quantification Learning: Distribution matching, robust estimation
  • Kernel Mean Embedding: RKHS methods, feature representations
  • Applications: Flow cytometry analysis, biomedical data processing
  • MLOps: Model deployment, API development, production systems

Selected Talks

Conference Presentations

  • ECML/PKDD 2023 - Turin, Italy
    Label Shift Quantification with Robust Guarantees via Distribution Feature Matching
    🏆 Research Track – Best Student Paper Award

Invited Seminars

  • Journées de Statistique - Société Française de Statistique, 2023
  • DataShape Seminar - INRIA, 2023
  • Workshop FAST-BIG - Efficient Statistical Testing for High-Dimensional Models
  • Séminaire des doctorants - Institut de Mathématiques d'Orsay, 2023

Teaching & Service

Seminar Organization

Co-organizer of the Master's seminar in Statistics and Machine Learning at Université Paris-Saclay (2022-2024)

Teaching

Teaching Fellow - IUT Sceaux (2022-2023)
Mathematics for Management - L1 B.U.T GEA, taught by Pr. Patrick Pamphile


Let's Connect

Popular repositories Loading

  1. ScientificProgamming ScientificProgamming Public

    PRE4 : Scientific Programming

  2. BastienDussap.github.io BastienDussap.github.io Public

    Forked from daattali/beautiful-jekyll

    ✨ Fork of https://beautifuljekyll.com for my website

    HTML

  3. qunfold qunfold Public

    Forked from mirkobunse/qunfold

    A unified implementation of quantification and unfolding algorithms

    Python

  4. BastienDussap BastienDussap Public

    My ReadMe

  5. FlowUtils FlowUtils Public

    Forked from whitews/FlowUtils

    FlowUtils is a Python package containing various utility functions related to flow cytometry analysis, primarily focused on compensation and transformation tasks commonly used within the flow commu…

    Python

  6. command_bash command_bash Public

    Some commands for my .bashrc

    Shell