Add IMH inner kernel option for NRS (no w_max)

## Background

PR #62 introduced **Nested Rejection Sampling (NRS)** in BlackJAX as an importance-weighted rejection sampler over the constrained prior at likelihood level $\ell(x) > \ell_0$.

The current inner procedure (`nrs.py`):

1. Fit a proposal $q(x)$ to the current live points (by default a Gaussian).
2. Draw batches of proposals $x_i \sim q(\cdot)$, keep only those with $\ell(x_i) > \ell_0$.
3. Compute importance weights $w(x) = \pi(x)/q(x)$.
4. Maintain a running maximum weight $w_{\max}$, accept survivors using stored uniforms: $u < w(x)/w_{\max}$.
5. If a new $w_{\max}$ is discovered, retroactively re-test all stored proposals against the updated maximum.

This works well but has known limitations:

- Efficiency depends on how well the empirical $w_{\max}$ approximates $\sup_x w(x)$.
- When $w_{\max}$ increases, previously accepted samples can be retroactively rejected.
- The empirical $w_{\max}$ is only an approximation (resolution $\sim 1/n_{\text{live}}$), and in challenging regimes the true supremum may be effectively unobserved.

As the likelihood constraint tightens, the target can become heavier-tailed relative to the fitted proposal $q$, causing **unbounded importance weights**.

The paper by Atchadé & Wang analyzes precisely this regime for both importance sampling and independence Metropolis-Hastings (IMH), establishing **polynomial (sub-geometric) convergence** rates when weights are unbounded but have finite moments under $q$.

## Proposal

Add an alternative NRS inner kernel based on **Independence Metropolis-Hastings (IMH)** that removes any dependence on $w_{\max}$.

We target the constrained density
$$\pi_{\ell_0}(x) \propto \pi(x)\,\mathbf{1}\{\ell(x) > \ell_0\},$$
using $q(\cdot)$ (Gaussian by default) as an independence proposal.

Given a current state $x$ with importance weight $w(x)=\pi(x)/q(x)$, one IMH step:

1. Propose $x^\star \sim q(\cdot)$.
2. If $\ell(x^\star)\le \ell_0$, reject immediately (hard constraint).
3. Otherwise compute $w(x^\star)=\pi(x^\star)/q(x^\star)$ and accept with probability
$$\alpha(x, x^\star) = \min\left(1,\frac{w(x^\star)}{w(x)}\right).$$
4. Set the next state to $x^\star$ if accepted, else stay at $x$.

To generate `num_delete` replacement particles, run `num_delete` IMH chains for some fixed number of steps or until some acceptance/ESS criterion is met.

Implementation suggestion: add a new module `from_imh.py` alongside the existing rejection-based path (`from_gaussian.py` / current default). Reuse the same `proposal_fn` interface `(key, n, **params) -> (samples, log_proposal)` and the same `update_inner_kernel_params_fn` so the outer NS adaptation stays unchanged.

## Advantages

- **Eliminates the need for $w_{\max}$**: no running maximum, no retroactive rejection, no dependence on approximating $\sup w$.
- **Well-studied in the unbounded-weight regime**: the paper proves polynomial convergence rates for IMH under moment conditions even when $w$ is unbounded.
- **More stable accept/reject logic**: acceptance depends only on the *ratio* $w(x^\star)/w(x)$, not on a global bound.
- **Natural path to heavy-tailed proposals**: IMH can benefit from Student-t or mixture proposals (also suggested by the paper) when the constrained target becomes heavy-tailed relative to a Gaussian fit.

## Disadvantages

- **Produces a Markov chain, not independent samples**: the current NRS rejection approach preserves independence of survivors within the stored buffer. IMH introduces correlation between successive states, changing the theoretical properties of the inner step and requiring careful thought about how many IMH steps are needed per NS iteration.
- **Mixing can be slow when $q$ is a poor global approximation**: even with polynomial convergence guarantees, the practical number of steps needed could be large in hard regimes.
- **More tuning surface**: need to decide a per-NS-step IMH budget (number of IMH iterations per replacement particle, number of parallel chains, burn-in, etc.).
- **Acceptance requires a valid current state**: the chain must be initialized at a state satisfying $\ell(x)>\ell_0$ (e.g., start from an existing live point).

## References

- Current NRS implementation: #62
- Atchadé & Wang, *Importance sampling and independent Metropolis-Hastings with unbounded importance weights*: https://arxiv.org/abs/2411.09514

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add IMH inner kernel option for NRS (no w_max) #63

Background

Proposal

Advantages

Disadvantages

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add IMH inner kernel option for NRS (no w_max) #63

Description

Background

Proposal

Advantages

Disadvantages

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions