Skip to content

Commit 8c2522a

Browse files
authored
minimal paper corrections; notation, added refs
1 parent 371df9c commit 8c2522a

File tree

2 files changed

+25
-15
lines changed

2 files changed

+25
-15
lines changed

paper/paper.bib

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -384,3 +384,13 @@ @article{Hutchinson
384384
https://doi.org/10.1080/03610919008812866
385385
}
386386
}
387+
388+
@misc{lipman2023flowmatchinggenerativemodeling,
389+
title={Flow Matching for Generative Modeling},
390+
author={Yaron Lipman and Ricky T. Q. Chen and Heli Ben-Hamu and Maximilian Nickel and Matt Le},
391+
year={2023},
392+
eprint={2210.02747},
393+
archivePrefix={arXiv},
394+
primaryClass={cs.LG},
395+
url={https://arxiv.org/abs/2210.02747},
396+
}

paper/paper.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,9 @@ Diffusion models [@diffusion; @ddpm; @sde] have emerged as the dominant paradigm
3434

3535
# Statement of need
3636

37-
Diffusion-based generative models [@diffusion; @ddpm] are a method for sampling from high-dimensional distributions. A sub-class of these models, score-based diffusion generatives models (SBGMs, [@sde]), permit exact-likelihood estimation via a change-of-variables associated with the forward diffusion process [@sde_ml]. Diffusion models allow fitting generative models to high-dimensional data in a more efficient way than normalising flows since only one neural network model parameterises the diffusion process as opposed to a sequence of neural networks in typical normalising flow architectures. Whilst existing diffusion models [@ddpm; @vdms] allow for sampling, they are limited to innaccurate variational inference approaches for density estimation which limits their use for Bayesian inference. This code provides density estimation with diffusion models using GPU enabled ODE solvers in `jax` [@jax] and `diffrax` [@kidger]. Similar codes (e.g. [@azula]) exist for diffusion models but they do not implement log-likelihood calculations, various network architectures and parallelised ODE-sampling.
37+
Diffusion-based generative models [@diffusion; @ddpm] are a method for sampling from high-dimensional distributions. A sub-class of these models, score-based diffusion generatives models (SBGMs, [@sde]), permit exact-likelihood estimation via a change-of-variables associated with the forward diffusion process [@sde_ml]. Diffusion models allow fitting generative models to high-dimensional data in a more efficient way than normalising flows since only one neural network model parameterises the diffusion process as opposed to a sequence of neural networks in typical normalising flow architectures. Whilst existing diffusion models [@ddpm; @vdms] allow for sampling, they are limited to innaccurate variational inference approaches for density estimation which limits their use for Bayesian inference. This code provides density estimation with diffusion models using GPU enabled ODE solvers in `jax` [@jax] and `diffrax` [@kidger]. Similar codes (e.g. [@azula]) exist for diffusion models but they do not implement log-likelihood calculations, various network architectures and parallelised computations for optimisation and SDE/ODE-sampling.
3838

39-
The software we present, `sbgm`, is designed to be used by researchers in machine learning and the natural sciences for fitting diffusion models with custom architectures for their research. These models can be fit easily with multi-accelerator training and inference within the code. Typical use cases for these kinds of generative models are emulator approaches [@emulating], simulation-based inference [@sbi], field-level inference [@field_level_inference] and general inverse problems [@inverse_problem_medical; @Remy; @Feng2023; @Feng2024] (e.g. image inpainting [@sde] and denoising [@ambientdiffusion; @blinddiffusion]). This code allows for seemless integration of diffusion models to these applications by providing data-generating models with easy conditioning of the data on any modality. Furthermore, the implementation in `equinox` [@equinox] guarantees safe integration of `sbgm` with any other sampling libraries (e.g. BlackJAX @blackjax) or `jax` [@jax] based codes.
39+
The software we present, `sbgm`, is designed to be used by researchers in machine learning and the natural sciences for fitting diffusion models with custom architectures for their research. These models can be fit easily with multi-accelerator training and inference routines within the code (with demonstration examples provided). Typical use cases for these kinds of generative models are emulator approaches [@emulating], simulation-based inference [@sbi], field-level inference [@field_level_inference] and general inverse problems [@inverse_problem_medical; @Remy; @Feng2023; @Feng2024] (e.g. image inpainting [@sde] and denoising [@ambientdiffusion; @blinddiffusion]). This code allows for seemless integration of diffusion models to these applications by providing data-generating models with easy conditioning of the data on any modality (e.g. images, audio or model parameters). Furthermore, the implementation in `equinox` [@equinox] guarantees safe integration of `sbgm` with any other sampling libraries (e.g. BlackJAX @blackjax) or `jax` [@jax] based codes.
4040

4141
![A diagram showing how to map data to a noise distribution (the prior) with an SDE, and reverse this SDE for generative modeling. One can also reverse the associated probability flow ODE, which yields a deterministic reverse process. Both the reverse-time SDE and probability flow ODE can be obtained by estimating the score.\label{fig:sde_ode}](sde_ode.png)
4242

@@ -47,50 +47,50 @@ Diffusion in the context of generative modelling describes the process of adding
4747
Score-based diffusion models [@sde] model a forward diffusion process with Stochastic Differential Equations (SDEs) of the form
4848

4949
$$
50-
\text{d}\boldsymbol{x} = f(\boldsymbol{x}, t)\text{d}t + g(t)\text{d}\boldsymbol{w},
50+
\text{d}\boldsymbol{x}_t = f(\boldsymbol{x}_t, t)\text{d}t + g(t)\text{d}\boldsymbol{w}_t,
5151
$$
5252

53-
where $f(\boldsymbol{x}, t)$ is a vector-valued function called the drift coefficient, $g(t)$ is the diffusion coefficient and $\text{d}\boldsymbol{w}$ is a sample of noise $\text{d}\boldsymbol{w}\sim \mathcal{G}[\text{d}\boldsymbol{w}|\mathbf{0}, \mathbf{I}]$. This equation describes the infinitely many samples of noise along the diffusion time $t$ that perturb the data. The diffusion path, defined by the SDE, begins at $t=0$ and ends at $T=0$ where the resulting distribution is then a multivariate Gaussian with mean zero and covariance $\mathbf{I}$.
53+
where $f(\boldsymbol{x}_t, t)$ is a vector-valued function called the drift coefficient, $g(t)$ is the diffusion coefficient and $\text{d}\boldsymbol{w}_t$ is a sample of noise $\text{d}\boldsymbol{w}_t\sim \mathcal{G}[\text{d}\boldsymbol{w}_t|\mathbf{0}, \mathbf{I}_{\boldsymbol{x}_t}]$. This equation describes the infinitely many samples of noise along the diffusion time $t$ that perturb the data. The diffusion path, defined by the SDE, begins at $t=0$ and ends at $T=0$ where the resulting distribution is then a multivariate Gaussian with mean zero and covariance $\mathbf{I}$. The code implements various SDEs known in the diffusion model literature.
5454

55-
The reverse of the SDE, mapping from multivariate Gaussian samples $\boldsymbol{x}(T)$ to samples of data $\boldsymbol{x}(0)$, is of the form
55+
The reverse of the SDE, mapping from multivariate Gaussian samples $\boldsymbol{x}(T)$ to samples of data $\boldsymbol{x}_0$, is of the form
5656

5757
$$
58-
\text{d}\boldsymbol{x} = [f(\boldsymbol{x}, t) - g^2(t)\nabla_{\boldsymbol{x}}\log p_t(\boldsymbol{x})]\text{d}t + g(t)\text{d}\boldsymbol{w},
58+
\text{d}\boldsymbol{x}_t = [f(\boldsymbol{x}_t, t) - g^2(t)\nabla_{\boldsymbol{x}_t}\log p_t(\boldsymbol{x}_t)]\text{d}t + g(t)\text{d}\boldsymbol{w}_t,
5959
$$
6060

61-
where the score function $\nabla_{\boldsymbol{x}}\log p_t(\boldsymbol{x})$ is substituted with a neural network $\boldsymbol{s}_{\theta}(\boldsymbol{x}(t), t)$ for the sampling process. The network is fit by score-matching [@score_matching; @score_matching2] across the time span $[0, T]$. This network predicts the noise added to the image at time $t$ with the forward diffusion process, in accordance with the SDE, and removes it. With a data-dimensional sample of Gaussian noise from the prior $p_T(\boldsymbol{x})$ (see Figure \ref{fig:sde_ode}) one can reverse the diffusion process to generate data.
61+
where the score function $\nabla_{\boldsymbol{x}_t}\log p_t(\boldsymbol{x}_t)$ is substituted with a neural network $\boldsymbol{s}_{\theta}(\boldsymbol{x}(t), t)$ for the sampling process. The network is fit by score-matching [@score_matching; @score_matching2] across the time span $[0, T]$. This network predicts the noise added to the image at time $t$ with the forward diffusion process, in accordance with the SDE, and removes it. With a data-dimensional sample of Gaussian noise from the prior $p_T(\boldsymbol{x})$ (see Figure \ref{fig:sde_ode}) one can reverse the diffusion process to generate data.
6262

6363
The reverse SDE may be solved with Euler-Murayama sampling [@sde] (or other annealed Langevin sampling methods) which is featured in the code.
6464

6565
# Likelihood calculations with diffusion models
6666

67-
Many of the applications of generative models depend on being able to calculate the likelihood of data. @sde show that any SDE may be converted into an ordinary differential equation (ODE) without changing the distributions, defined by the SDE, from which the noise is sampled from in the diffusion process (denoted $p_t(x)$ and shown in grey in Figure \ref{fig:sde_ode}). This ODE is known as the probability flow ODE [@sde; @sde_ml] and is written
67+
Many of the applications of generative models depend on being able to calculate the likelihood of data. @sde show that any SDE may be converted into an ordinary differential equation (ODE) without changing the distributions $p_t(\boldsymbol{x}_t)$, defined by the SDE, from which the noise is sampled from in the diffusion process (denoted $p_t(x)$ and shown in grey in Figure \ref{fig:sde_ode}). This ODE is known as the probability flow ODE [@sde; @sde_ml] and is written
6868

6969
$$
70-
\text{d}\boldsymbol{x} = [f(\boldsymbol{x}, t) - g^2(t)\nabla_{\boldsymbol{x}}\log p_t(\boldsymbol{x})]\text{d}t = f'(\boldsymbol{x}, t)\text{d}t.
70+
\text{d}\boldsymbol{x}_t = [f(\boldsymbol{x}_t, t) - g^2(t)\nabla_{\boldsymbol{x}_t}\log p_t(\boldsymbol{x}_t)]\text{d}t = f'(\boldsymbol{x}_t, t)\text{d}t.
7171
$$
7272

73-
This ODE can be solved with an initial-value problem. Starting with a data point $\boldsymbol{x}(0)\sim p(\boldsymbol{x})$, this point is mapped along the probability flow ODE path (see the right-hand side of Figure \ref{fig:sde_ode}) to a sample from the multivariate Gaussian prior. This inherits the formalism of continuous normalising flows [@neuralodes; @ffjord] without the expensive ODE simulations used to train these models - allowing for a likelihood estimate based on diffusion models [@sde_ml]. The initial value problem provides a solution $\boldsymbol{x}(T)$ and the change in probability along the path $\Delta=\log p(\boldsymbol{x}(0)) - \log p(\boldsymbol{x}(T))$ where $p(\boldsymbol{x}(T))$ is a simple multivariate Gaussian distribution.
73+
This ODE can be solved with an initial-value problem to sample new data or estimate its density. Starting with a data point $\boldsymbol{x}_0 \sim p(\boldsymbol{x})=p_0(\boldsymbol{x}_0)$, this point is mapped along the probability flow ODE path (see the right-hand side of Figure \ref{fig:sde_ode}) to a sample from the multivariate Gaussian prior $x_T \sim p_T(\boldsymbol{x}_T)$. This inherits the formalism of continuous normalising flows [@neuralodes; @ffjord] without the expensive ODE simulations used to train these models - allowing for a likelihood estimate based on diffusion models [@sde_ml]. The initial value problem provides a solution $\boldsymbol{x}_T$ and the change in probability along the path $\Delta=\log p_0(\boldsymbol{x}_0) - \log p_T(\boldsymbol{x}_T)$ where $p_T(\boldsymbol{x}_T)$ is a simple multivariate Gaussian distribution. Various ODE solvers of different orders are available (for a user to balance speed and accuracy of sampling) which are provided by `diffrax` [@kidger].
7474

75-
![A diagram showing a log-likelihood calculation over the support of a Gaussian mixture model with eight components. Data is drawn (shown in red) from this mixture to train the diffusion model that gives the likelihood in gray. The log-likelihood is calculated using the ODE and a trained diffusion model. \label{fig:8gauss}](8gauss.png){ width=50% }
75+
![A diagram showing a log-likelihood calculation over the support of a Gaussian mixture model with eight components. Data is drawn (shown in red) from this mixture to train the diffusion model that gives the likelihood (defined by the diffusion model) in gray. The log-likelihood is calculated using the ODE and a trained diffusion model. \label{fig:8gauss}](8gauss.png){ width=50% }
7676

7777
The likelihood estimate under a score-based diffusion model is estimated by solving the change-of-variables equation for continuous normalising flows.
7878

7979
$$
80-
\frac{\partial}{\partial t} \log p(\boldsymbol{x}(t)) = \nabla_{\boldsymbol{x}} \cdot f(\boldsymbol{x}(t), t),
80+
\frac{\partial}{\partial t} \log p_t(\boldsymbol{x}_t) = \nabla_{\boldsymbol{x}_t} \cdot f(\boldsymbol{x}_t, t),
8181
$$
8282

83-
which gives the log-likelihood of a single datapoint $\boldsymbol{x}(0)$ as
83+
which gives the log-likelihood of a single datapoint $\boldsymbol{x}_0$ as
8484

8585
$$
86-
\log p(\boldsymbol{x}(0)) = \log p(\boldsymbol{x}(T)) + \int_{t=0}^{t=T}\text{d}t \; \nabla_{\boldsymbol{x}}\cdot f(\boldsymbol{x}, t).
86+
\log p(\boldsymbol{x}_0) = \log p(\boldsymbol{x}_T) + \int_{t=0}^{t=T}\text{d}t \; \nabla_{\boldsymbol{x}_t}\cdot f(\boldsymbol{x}_t, t).
8787
$$
8888

8989
The code implements these calculations also for the Hutchinson trace estimation method [@ffjord, @Hutchinson] that reduces the computational expense of the estimate. Figure \ref{fig:8gauss} shows an example of a data-likelihood calculation using a trained diffusion model with the ODE associated from an SDE.
9090

9191
# Implementations and future work
9292

93-
Diffusion models are defined in `sbgm` via a score-network model $\boldsymbol{s}_{\theta}$ and an SDE. All the availble SDEs (variance exploding (VE), variance preserving (VP) and sub-variance preserving (SubVP) [@sde]) in the literature of score-based diffusion models are available. We provide implementations for UNet [@unet], Diffusion Transformers [@dit], MLP-Mixer [@mixer] and Residual Network [@resnet] models which are state-of-the-art for diffusion tasks. It is possible to fit score-based diffusion models to a conditional distribution $p(\boldsymbol{x}|\boldsymbol{\pi}, \boldsymbol{y})$ where in typical inverse problems $\boldsymbol{y}$ would be an image and $\boldsymbol{\pi}$ a set of parameters in a physical model for the data [@conditional_diffusion] (e.g. to solve inverse problems). The code is compatible with any model written in the `equinox` [@equinox] framework. We are extending the code to provide transformer-based [@dits] and latent diffusion models [@ldms].
93+
Diffusion models are defined in `sbgm` via a score-network model $\boldsymbol{s}_{\theta}$ and an SDE. All the availble SDEs (variance exploding (VE), variance preserving (VP) and sub-variance preserving (SubVP) [@sde]) in the literature of score-based diffusion models are available. We provide implementations for UNet [@unet], Diffusion Transformers [@dit], MLP-Mixer [@mixer] and Residual Network [@resnet] models which are state-of-the-art for diffusion tasks. It is possible to fit score-based diffusion models to a conditional distribution $p(\boldsymbol{x}|\boldsymbol{\pi}, \boldsymbol{y})$ where in typical inverse problems $\boldsymbol{y}$ would be an image and $\boldsymbol{\pi}$ a set of parameters in a physical model for the data [@conditional_diffusion] (e.g. to solve inverse problems). The code is compatible with any model written in the `equinox` [@equinox] framework. We recently extended the code to provide transformer-based diffusion models [@dits] and plan to extend to latent diffusion models [@ldms] and flow matching [@lipman2023flowmatchinggenerativemodeling].
9494

9595
# Acknowledgements
9696

0 commit comments

Comments
 (0)