Shortest-Path Flow Matching with Mixture-Conditioned Bases for OOD Generalization to Unseen Conditions.
(A) Vanilla CFM |
(B) SP-FM |
This repository contains research code for shortest-path flow matching with descriptor-conditioned mixture bases for descriptor-controlled generation.
Instead of relying on a single Gaussian base distribution, the method learns a condition-dependent mixture base jointly with a descriptor-conditioned flow field, trained via shortest-path (optimal transport) flow matching. Conditioning the base enables the model to adapt its starting distribution across conditions, improving out-of-distribution (OOD) generalization to unseen conditions.
This repository accompanies the arXiv manuscript:
- Title: Shortest-Path Flow Matching with Mixture-Conditioned Bases for OOD Generalization to Unseen Conditions
- arXiv: 2601.11827v2 [cs.LG] (11 Feb 2026)
- Paper link: https://arxiv.org/html/2601.11827v2
We construct a synthetic benchmark of letter populations, where each condition corresponds to a letter and a specific rotation. Each descriptor encodes the letter identity and rotation, and the model learns a mixture base distribution per condition. This setup allows us to test extrapolation to unseen letters and rotation angles.
We evaluate on high-content imaging data in feature space. Cells (from BBBC021 and RxRx1) are embedded with a vision backbone, and the model is trained to generate unseen phenotypic responses from compound descriptors alone.
For transcriptomic perturbations, we use Chemical- or CRISPR-based single-cell datasets (Norman, ComboSciPlex, Replogle and iAstrocytes). Conditions correspond to perturbation embeddings from pretrained models, and the model is trained to model the distribution of perturbed cells.
Check the documentation for more information about how to use the model and get the data.
This work is released with the MIT license, please see the license file for more information.

