You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper/paper.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,9 +34,9 @@ Diffusion models [@diffusion; @ddpm; @sde] have emerged as the dominant paradigm
34
34
35
35
# Statement of need
36
36
37
-
Diffusion-based generative models [@diffusion; @ddpm] are a method for sampling from high-dimensional distributions. A sub-class of these models, score-based diffusion generatives models (SBGMs, [@sde]), permit exact-likelihood estimation via a change-of-variables associated with the forward diffusion process [@sde_ml]. Diffusion models allow fitting generative models to high-dimensional data in a more efficient way than normalising flows since only one neural network model parameterises the diffusion process as opposed to a sequence of neural networks in typical normalising flow architectures. Whilst existing diffusion models [@ddpm; @vdms] allow for sampling, they are limited to innaccurate variational inference approaches for density estimation which limits their use for Bayesian inference. This code provides density estimation with diffusion models using GPU enabled ODE solvers in `jax` [@jax] and `diffrax` [@kidger]. Similar codes (e.g. [@azula]) exist for diffusion models but they do not implement log-likelihood calculations, various network architectures and parallelised ODE-sampling.
37
+
Diffusion-based generative models [@diffusion; @ddpm] are a method for sampling from high-dimensional distributions. A sub-class of these models, score-based diffusion generatives models (SBGMs, [@sde]), permit exact-likelihood estimation via a change-of-variables associated with the forward diffusion process [@sde_ml]. Diffusion models allow fitting generative models to high-dimensional data in a more efficient way than normalising flows since only one neural network model parameterises the diffusion process as opposed to a sequence of neural networks in typical normalising flow architectures. Whilst existing diffusion models [@ddpm; @vdms] allow for sampling, they are limited to innaccurate variational inference approaches for density estimation which limits their use for Bayesian inference. This code provides density estimation with diffusion models using GPU enabled ODE solvers in `jax` [@jax] and `diffrax` [@kidger]. Similar codes (e.g. [@azula]) exist for diffusion models but they do not implement log-likelihood calculations, various network architectures and parallelised computations for optimisation and SDE/ODE-sampling.
38
38
39
-
The software we present, `sbgm`, is designed to be used by researchers in machine learning and the natural sciences for fitting diffusion models with custom architectures for their research. These models can be fit easily with multi-accelerator training and inference within the code. Typical use cases for these kinds of generative models are emulator approaches [@emulating], simulation-based inference [@sbi], field-level inference [@field_level_inference] and general inverse problems [@inverse_problem_medical; @Remy; @Feng2023; @Feng2024] (e.g. image inpainting [@sde] and denoising [@ambientdiffusion; @blinddiffusion]). This code allows for seemless integration of diffusion models to these applications by providing data-generating models with easy conditioning of the data on any modality. Furthermore, the implementation in `equinox`[@equinox] guarantees safe integration of `sbgm` with any other sampling libraries (e.g. BlackJAX @blackjax) or `jax`[@jax] based codes.
39
+
The software we present, `sbgm`, is designed to be used by researchers in machine learning and the natural sciences for fitting diffusion models with custom architectures for their research. These models can be fit easily with multi-accelerator training and inference routines within the code (with demonstration examples provided). Typical use cases for these kinds of generative models are emulator approaches [@emulating], simulation-based inference [@sbi], field-level inference [@field_level_inference] and general inverse problems [@inverse_problem_medical; @Remy; @Feng2023; @Feng2024] (e.g. image inpainting [@sde] and denoising [@ambientdiffusion; @blinddiffusion]). This code allows for seemless integration of diffusion models to these applications by providing data-generating models with easy conditioning of the data on any modality (e.g. images, audio or model parameters). Furthermore, the implementation in `equinox` [@equinox] guarantees safe integration of `sbgm` with any other sampling libraries (e.g. BlackJAX @blackjax) or `jax` [@jax] based codes.
40
40
41
41

42
42
@@ -47,50 +47,50 @@ Diffusion in the context of generative modelling describes the process of adding
47
47
Score-based diffusion models [@sde] model a forward diffusion process with Stochastic Differential Equations (SDEs) of the form
where $f(\boldsymbol{x}, t)$ is a vector-valued function called the drift coefficient, $g(t)$ is the diffusion coefficient and $\text{d}\boldsymbol{w}$ is a sample of noise $\text{d}\boldsymbol{w}\sim \mathcal{G}[\text{d}\boldsymbol{w}|\mathbf{0}, \mathbf{I}]$. This equation describes the infinitely many samples of noise along the diffusion time $t$ that perturb the data. The diffusion path, defined by the SDE, begins at $t=0$ and ends at $T=0$ where the resulting distribution is then a multivariate Gaussian with mean zero and covariance $\mathbf{I}$.
53
+
where $f(\boldsymbol{x}_t, t)$ is a vector-valued function called the drift coefficient, $g(t)$ is the diffusion coefficient and $\text{d}\boldsymbol{w}_t$ is a sample of noise $\text{d}\boldsymbol{w}_t\sim \mathcal{G}[\text{d}\boldsymbol{w}_t|\mathbf{0}, \mathbf{I}_{\boldsymbol{x}_t}]$. This equation describes the infinitely many samples of noise along the diffusion time $t$ that perturb the data. The diffusion path, defined by the SDE, begins at $t=0$ and ends at $T=0$ where the resulting distribution is then a multivariate Gaussian with mean zero and covariance $\mathbf{I}$. The code implements various SDEs known in the diffusion model literature.
54
54
55
-
The reverse of the SDE, mapping from multivariate Gaussian samples $\boldsymbol{x}(T)$ to samples of data $\boldsymbol{x}(0)$, is of the form
55
+
The reverse of the SDE, mapping from multivariate Gaussian samples $\boldsymbol{x}(T)$ to samples of data $\boldsymbol{x}_0$, is of the form
where the score function $\nabla_{\boldsymbol{x}}\log p_t(\boldsymbol{x})$ is substituted with a neural network $\boldsymbol{s}_{\theta}(\boldsymbol{x}(t), t)$ for the sampling process. The network is fit by score-matching [@score_matching; @score_matching2] across the time span $[0, T]$. This network predicts the noise added to the image at time $t$ with the forward diffusion process, in accordance with the SDE, and removes it. With a data-dimensional sample of Gaussian noise from the prior $p_T(\boldsymbol{x})$ (see Figure \ref{fig:sde_ode}) one can reverse the diffusion process to generate data.
61
+
where the score function $\nabla_{\boldsymbol{x}_t}\log p_t(\boldsymbol{x}_t)$ is substituted with a neural network $\boldsymbol{s}_{\theta}(\boldsymbol{x}(t), t)$ for the sampling process. The network is fit by score-matching [@score_matching; @score_matching2] across the time span $[0, T]$. This network predicts the noise added to the image at time $t$ with the forward diffusion process, in accordance with the SDE, and removes it. With a data-dimensional sample of Gaussian noise from the prior $p_T(\boldsymbol{x})$ (see Figure \ref{fig:sde_ode}) one can reverse the diffusion process to generate data.
62
62
63
63
The reverse SDE may be solved with Euler-Murayama sampling [@sde] (or other annealed Langevin sampling methods) which is featured in the code.
64
64
65
65
# Likelihood calculations with diffusion models
66
66
67
-
Many of the applications of generative models depend on being able to calculate the likelihood of data. @sde show that any SDE may be converted into an ordinary differential equation (ODE) without changing the distributions, defined by the SDE, from which the noise is sampled from in the diffusion process (denoted $p_t(x)$ and shown in grey in Figure \ref{fig:sde_ode}). This ODE is known as the probability flow ODE [@sde; @sde_ml] and is written
67
+
Many of the applications of generative models depend on being able to calculate the likelihood of data. @sde show that any SDE may be converted into an ordinary differential equation (ODE) without changing the distributions $p_t(\boldsymbol{x}_t)$, defined by the SDE, from which the noise is sampled from in the diffusion process (denoted $p_t(x)$ and shown in grey in Figure \ref{fig:sde_ode}). This ODE is known as the probability flow ODE [@sde; @sde_ml] and is written
This ODE can be solved with an initial-value problem. Starting with a data point $\boldsymbol{x}(0)\sim p(\boldsymbol{x})$, this point is mapped along the probability flow ODE path (see the right-hand side of Figure \ref{fig:sde_ode}) to a sample from the multivariate Gaussian prior. This inherits the formalism of continuous normalising flows [@neuralodes; @ffjord] without the expensive ODE simulations used to train these models - allowing for a likelihood estimate based on diffusion models [@sde_ml]. The initial value problem provides a solution $\boldsymbol{x}(T)$ and the change in probability along the path $\Delta=\log p(\boldsymbol{x}(0)) - \log p(\boldsymbol{x}(T))$ where $p(\boldsymbol{x}(T))$ is a simple multivariate Gaussian distribution.
73
+
This ODE can be solved with an initial-value problem to sample new data or estimate its density. Starting with a data point $\boldsymbol{x}_0 \sim p(\boldsymbol{x})=p_0(\boldsymbol{x}_0)$, this point is mapped along the probability flow ODE path (see the right-hand side of Figure \ref{fig:sde_ode}) to a sample from the multivariate Gaussian prior $x_T \sim p_T(\boldsymbol{x}_T)$. This inherits the formalism of continuous normalising flows [@neuralodes; @ffjord] without the expensive ODE simulations used to train these models - allowing for a likelihood estimate based on diffusion models [@sde_ml]. The initial value problem provides a solution $\boldsymbol{x}_T$ and the change in probability along the path $\Delta=\log p_0(\boldsymbol{x}_0) - \log p_T(\boldsymbol{x}_T)$ where $p_T(\boldsymbol{x}_T)$ is a simple multivariate Gaussian distribution. Various ODE solvers of different orders are available (for a user to balance speed and accuracy of sampling) which are provided by `diffrax`[@kidger].
74
74
75
-
{ width=50% }
75
+
{ width=50% }
76
76
77
77
The likelihood estimate under a score-based diffusion model is estimated by solving the change-of-variables equation for continuous normalising flows.
The code implements these calculations also for the Hutchinson trace estimation method [@ffjord, @Hutchinson] that reduces the computational expense of the estimate. Figure \ref{fig:8gauss} shows an example of a data-likelihood calculation using a trained diffusion model with the ODE associated from an SDE.
90
90
91
91
# Implementations and future work
92
92
93
-
Diffusion models are defined in `sbgm` via a score-network model $\boldsymbol{s}_{\theta}$ and an SDE. All the availble SDEs (variance exploding (VE), variance preserving (VP) and sub-variance preserving (SubVP) [@sde]) in the literature of score-based diffusion models are available. We provide implementations for UNet [@unet], Diffusion Transformers [@dit], MLP-Mixer [@mixer] and Residual Network [@resnet] models which are state-of-the-art for diffusion tasks. It is possible to fit score-based diffusion models to a conditional distribution $p(\boldsymbol{x}|\boldsymbol{\pi}, \boldsymbol{y})$ where in typical inverse problems $\boldsymbol{y}$ would be an image and $\boldsymbol{\pi}$ a set of parameters in a physical model for the data [@conditional_diffusion] (e.g. to solve inverse problems). The code is compatible with any model written in the `equinox`[@equinox] framework. We are extending the code to provide transformer-based [@dits] and latent diffusion models [@ldms].
93
+
Diffusion models are defined in `sbgm` via a score-network model $\boldsymbol{s}_{\theta}$ and an SDE. All the availble SDEs (variance exploding (VE), variance preserving (VP) and sub-variance preserving (SubVP) [@sde]) in the literature of score-based diffusion models are available. We provide implementations for UNet [@unet], Diffusion Transformers [@dit], MLP-Mixer [@mixer] and Residual Network [@resnet] models which are state-of-the-art for diffusion tasks. It is possible to fit score-based diffusion models to a conditional distribution $p(\boldsymbol{x}|\boldsymbol{\pi}, \boldsymbol{y})$ where in typical inverse problems $\boldsymbol{y}$ would be an image and $\boldsymbol{\pi}$ a set of parameters in a physical model for the data [@conditional_diffusion] (e.g. to solve inverse problems). The code is compatible with any model written in the `equinox` [@equinox] framework. We recently extended the code to provide transformer-based diffusion models [@dits] and plan to extend to latent diffusion models [@ldms] and flow matching [@lipman2023flowmatchinggenerativemodeling].
0 commit comments