-\end{aligned}\]</p><p>where <span>$L_{\textrm{BCE}}(y, c) \equiv -y\log(c) - (1 - y) \log(1 - c)$</span>. It can be shown (e.g., <a href="https://proceedings.mlr.press/v119/hermans20a.html">Hermans et al., 2020</a>, App. B) that the Bayes classifier is given by </p><p class="math-container">\[c^*(\boldsymbol{Z}, \boldsymbol{\theta}) = \frac{p(\boldsymbol{Z}, \boldsymbol{\theta})}{p(\boldsymbol{Z}, \boldsymbol{\theta}) + p(\boldsymbol{\theta})p(\boldsymbol{Z})}, \quad \boldsymbol{Z} \in \mathcal{Z}, \boldsymbol{\theta} \in \Theta,\]</p><p>and, hence,</p><p class="math-container">\[r(\boldsymbol{Z}, \boldsymbol{\theta}) = \frac{c^*(\boldsymbol{Z}, \boldsymbol{\theta})}{1 - c^*(\boldsymbol{Z}, \boldsymbol{\theta})}, \quad \boldsymbol{Z} \in \mathcal{Z}, \boldsymbol{\theta} \in \Theta.\]</p><p>This connection links the likelihood-to-evidence ratio to the average-risk-optimal solution of a standard binary classification problem, and consequently provides a foundation for approximating the ratio using neural networks. Specifically, let <span>$c_{\boldsymbol{\gamma}}: \mathcal{Z} \times \Theta \to (0, 1)$</span> denote a neural network parametrised by <span>$\boldsymbol{\gamma}$</span>. Then the Bayes classifier may be approximated by <span>$c_{\boldsymbol{\gamma}^*}(\cdot, \cdot)$</span>, where </p><p class="math-container">\[ \boldsymbol{\gamma}^* \equiv \underset{\boldsymbol{\gamma}}{\mathrm{arg\,min}} -\sum_{k=1}^K \Big[\log\{c_{\boldsymbol{\gamma}}(\boldsymbol{Z}^{(k)}, \boldsymbol{\theta}^{(k)})\} + \log\{1 - c_{\boldsymbol{\gamma}}(\boldsymbol{Z}^{(\sigma(k))}, \boldsymbol{\theta}^{(k)})\} \Big],\]</p><p>with each <span>$\boldsymbol{\theta}^{(k)}$</span> sampled independently from a "proposal" distribution <span>$p(\boldsymbol{\theta})$</span>, <span>$\boldsymbol{Z}^{(k)} \sim p(\boldsymbol{Z} \mid \boldsymbol{\theta}^{(k)})$</span>, and <span>$\sigma(\cdot)$</span> a random permutation of <span>$\{1, \dots, K\}$</span>. The proposal distribution <span>$p(\boldsymbol{\theta})$</span> does not necessarily correspond to the prior distribution <span>$\pi(\boldsymbol{\theta})$</span>, which is specified in the downstream inference algorithm (see below). In theory, any <span>$p(\boldsymbol{\theta})$</span> with support over <span>$\Theta$</span> can be used. However, with finite training data, the choice of <span>$p(\boldsymbol{\theta})$</span> is important, as it determines where the parameters <span>$\{\boldsymbol{\theta}^{(k)}\}$</span> are most densely sampled and, hence, where the neural network <span>$c_{\boldsymbol{\gamma}^*}(\cdot, \cdot)$</span> best approximates the Bayes classifier. Further, since neural networks are only reliable within the support of their training samples, a <span>$p(\boldsymbol{\theta})$</span> lacking full support over <span>$\Theta$</span> essentially acts as a "soft prior". </p><p>Once the neural network is trained, <span>$r_{\boldsymbol{\gamma}^*}(\boldsymbol{Z}, \boldsymbol{\theta}) \equiv c_{\boldsymbol{\gamma}^*}(\boldsymbol{Z}, \boldsymbol{\theta})\{1 - c_{\boldsymbol{\gamma}^*}(\boldsymbol{Z}, \boldsymbol{\theta})\}^{-1}$</span>, <span>$\boldsymbol{Z} \in \mathcal{Z}, \boldsymbol{\theta} \in \Theta$</span>, may be used to quickly approximate the likelihood-to-evidence ratio, and therefore it is called a <em>neural ratio estimator</em>. </p><p>Inference based on a neural ratio estimator may proceed in a frequentist setting via maximum likelihood and likelihood ratios (e.g., <a href="https://doi.org/10.1016/j.spasta.2024.100848">Walchessen et al., 2024</a>), and in a Bayesian setting by facilitating the computation of transition probabilities in Hamiltonian Monte Carlo and MCMC algorithms (e.g., <a href="https://proceedings.mlr.press/v119/hermans20a.html">Hermans et al., 2020</a>). Further, an approximate posterior distribution can be obtained via the identity <span>${p(\boldsymbol{\theta} \mid \boldsymbol{Z})} = \pi(\boldsymbol{\theta}) r(\boldsymbol{\theta}, \boldsymbol{Z})$</span>, and sampled from using standard sampling techniques (e.g., <a href="https://doi.org/10.1214/20-BA1238">Thomas et al., 2022</a>).</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../">« NeuralEstimators</a><a class="docs-footer-nextpage" href="../workflow/overview/">Overview »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.12.0 on <span class="colophon-date" title="Tuesday 17 June 2025 06:32">Tuesday 17 June 2025</span>. Using Julia version 1.11.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
0 commit comments