Skip to content

Commit a8280b5

Browse files
seabbs-botseabbs
andauthored
Fix #534: Add marginal model to model outline document (#594)
Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
1 parent 6b4f91f commit a8280b5

3 files changed

Lines changed: 35 additions & 1 deletion

File tree

README.Rmd

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,9 +142,11 @@ citation("epidist")
142142
If using our methodology, or the methodology on which ours is based, please cite the relevant papers.
143143
This may include:
144144

145-
* [Estimating epidemiological delay distributions for infectious diseases](https://www.medrxiv.org/content/10.1101/2024.01.12.24301247v1) by Park *et al.* (2024)
145+
* [Estimating epidemiological delay distributions for infectious diseases](https://www.medrxiv.org/content/10.1101/2024.01.12.24301247v1) by Park *et al.* (2024) -- if using the latent model
146146
* [Best practices for estimating and reporting epidemiological delay distributions of infectious diseases using public health surveillance and healthcare data](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012520) by Charniga *et al.* (2024)
147147

148+
If using the marginal model, please also cite the [`primarycensored`](https://primarycensored.epinowcast.org/) package using `citation("primarycensored")`.
149+
148150
## Contributors
149151

150152

vignettes/model.Rmd

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,7 @@ This then allows the modelling of the continuous distribution, adjusted for the
154154
Whilst this is an approximation [@park2024estimating] showed good recovery of simulated distributions in a range of settings.
155155
However, the use of two latent variables per observed delay means that this approach may scale poorly with larger datasets.
156156
That being said this approach has been used successfully in multiple real-world outbreak settings ([@ward2022transmission]).
157+
If using the latent model, please cite @park2024estimating in addition to `epidist`.
157158

158159
Mathematically this model is described as follows.
159160
We look at the conditional probability that the secondary event $S$ falls between $S_L$ and $S_R$, given that the primary event $P$ falls between $P_L$ and $P_R$ and that the secondary event $S$ occurs before the truncation time $T$:
@@ -177,4 +178,28 @@ y_i &\sim \text{Unif}(s_{L, i}, s_{R, i}) \\
177178
$$
178179
As before, $g_P(z \, | \, p_{L, i}, p_{R, i})$ represents the conditional distribution of the primary event given lower $P_L$ and upper $P_R$ bounds; this is equivalent to modelling the incidence in primary events.
179180

181+
# The marginal model
182+
183+
The marginal model corrects for the same biases as the latent model but integrates out the exact event times numerically, or analytically where closed-form solutions exist, rather than sampling latent variables.
184+
This approach uses the primary event censored distribution implemented in the [`primarycensored`](https://primarycensored.epinowcast.org/) package [@primarycensored].
185+
If using the marginal model, please cite `primarycensored` in addition to `epidist`.
186+
187+
Under the assumption that the forward distribution does not change within the censoring interval (i.e. $f_x = f$ for $x \in [P_L, P_R]$), the double censoring probability from Section \@ref(interval-censoring) simplifies to
188+
$$
189+
\mathbb{P}(S_L < S < S_R \mid P_L < P < P_R) = \int_{P_L}^{P_R} g_P(x \mid P_L, P_R) \left[F(S_R - x) - F(S_L - x)\right] \text{d}x.
190+
$$
191+
For common delay and primary event distributions, such as gamma or lognormal delays with uniform primary events, `primarycensored` provides closed-form analytical solutions to this integral.
192+
For other combinations, numerical integration is used.
193+
194+
Right truncation at time $T$ is handled by normalising the likelihood as in the latent model:
195+
$$
196+
\mathcal{L}(\mathbf{Y} \mid \mathbf{\theta}) = \prod_i \frac{\mathbb{P}(S_{L,i} < S_i < S_{R,i} \mid P_{L,i} < P_i < P_{R,i})}{\int_{P_{L,i}}^{P_{R,i}} g_P(z \mid p_{L,i}, p_{R,i}) F(T - z) \, \text{d}z}.
197+
$$
198+
199+
Removing the latent variables reduces the number of parameters that must be sampled, and where analytical solutions exist the likelihood can be evaluated without numerical integration.
200+
In addition, identical observations can be aggregated and the likelihood computed once per unique combination of delay, censoring windows, and covariates.
201+
Together these make the marginal model substantially more efficient than the latent model, particularly for larger datasets with daily-censored data where many observations share the same structure.
202+
203+
For the mathematical details of primary event censored distributions, including the survival function derivation and closed-form solutions for specific distributions, see `vignette("why-it-works", package = "primarycensored")` and `vignette("analytic-solutions", package = "primarycensored")`.
204+
180205
## References {-}

vignettes/references.bib

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,13 @@ @article {park2024estimating
1010
journal = {medRxiv}
1111
}
1212

13+
@Manual{primarycensored,
14+
title = {primarycensored: Primary Event Censored Distributions},
15+
author = {Sam Abbott and Sam Brand and James Mba Azam and Carl Pearson and Sebastian Funk and Kelly Charniga},
16+
year = {2025},
17+
doi = {10.5281/zenodo.13632839},
18+
}
19+
1320
@article{charniga2024best,
1421
doi = {10.1371/journal.pcbi.1012520},
1522
author = {Charniga, Kelly and Park, Sang Woo and Akhmetzhanov, Andrei R. and Cori, Anne and Dushoff, Jonathan and Funk, Sebastian and Gostic, Katelyn M. and Linton, Natalie M. and Lison, Adrian and Overton, Christopher E. and Pulliam, Juliet R. C. and Ward, Thomas and Cauchemez, Simon and Abbott, Sam},

0 commit comments

Comments
 (0)