Skip to content

MixFlow: Mixture-Conditioned Flow Matching for Out-of-Distribution Generalization

License

Notifications You must be signed in to change notification settings

Lotfollahi-lab/SP-FM

Repository files navigation

MixFlow

Code License: MIT arXiv Python PyTorch Project status Docs

Shortest-Path Flow Matching with Mixture-Conditioned Bases for OOD Generalization to Unseen Conditions.



(A) Vanilla CFM

(B) SP-FM

Overview

This repository contains research code for shortest-path flow matching with descriptor-conditioned mixture bases for descriptor-controlled generation.

Instead of relying on a single Gaussian base distribution, the method learns a condition-dependent mixture base jointly with a descriptor-conditioned flow field, trained via shortest-path (optimal transport) flow matching. Conditioning the base enables the model to adapt its starting distribution across conditions, improving out-of-distribution (OOD) generalization to unseen conditions.

Publication

This repository accompanies the arXiv manuscript:

  • Title: Shortest-Path Flow Matching with Mixture-Conditioned Bases for OOD Generalization to Unseen Conditions
  • arXiv: 2601.11827v2 [cs.LG] (11 Feb 2026)
  • Paper link: https://arxiv.org/html/2601.11827v2

Datasets

Synthetic Data

We construct a synthetic benchmark of letter populations, where each condition corresponds to a letter and a specific rotation. Each descriptor encodes the letter identity and rotation, and the model learns a mixture base distribution per condition. This setup allows us to test extrapolation to unseen letters and rotation angles.

Morphological Perturbations

We evaluate on high-content imaging data in feature space. Cells (from BBBC021 and RxRx1) are embedded with a vision backbone, and the model is trained to generate unseen phenotypic responses from compound descriptors alone.

Perturbation Datasets

For transcriptomic perturbations, we use Chemical- or CRISPR-based single-cell datasets (Norman, ComboSciPlex, Replogle and iAstrocytes). Conditions correspond to perturbation embeddings from pretrained models, and the model is trained to model the distribution of perturbed cells.

Documentation

Check the documentation for more information about how to use the model and get the data.

License

This work is released with the MIT license, please see the license file for more information.

About

MixFlow: Mixture-Conditioned Flow Matching for Out-of-Distribution Generalization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages