YuminosukeSato
diff --git a/‎CHANGELOG.md‎
Lines changed: 18 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 18 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 6 additions & 1 deletion b/‎README.md‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎docs/api.md‎
Lines changed: 64 additions & 2 deletions b/‎docs/api.md‎
Lines changed: 64 additions & 2 deletions
diff --git a/‎docs/theory.md‎
Lines changed: 161 additions & 0 deletions b/‎docs/theory.md‎
Lines changed: 161 additions & 0 deletions
@@ -4,6 +4,24 @@ All notable changes to this project will be documented in this file.
 
 The format is based on Keep a Changelog.
 
+## [Unreleased]
+
+### Added
+
+- Horseshoe prior as alternative to spike-and-slab via `ModelOptions(prior_type='horseshoe')`
+  (Kohns & Bhattacharjee 2022, arXiv:2011.00938). Recommended for dense DGP settings
+  where many covariates have true effects.
+- `posterior_shrinkage` property: mean shrinkage factor kappa_j per covariate (horseshoe only).
+- `kappa_shrinkage` field in Rust sampler output for per-iteration shrinkage diagnostics.
+
+### Fixed
+
+- `sample_inv_gamma` no longer panics on non-finite parameters (e.g. extreme-scale
+  inputs with `standardize_data=False`). Returns a small positive fallback instead.
+- `_normalize_model_args` now rejects unknown dict keys (e.g. typo `prior_typee`
+  silently falling back to `spike_slab` is no longer possible).
+- `kappa()` diagnostic now uses the same floor as the precision diagonal for consistency.
+
 ## [1.6.0] - 2026-03-25
 
 ### Added
 
@@ -100,6 +100,7 @@ Posterior prob. of a causal effect: 99.90%
 | Algorithm | Gibbs (bsts/C++) | Gibbs (Rust) | TFP-based | VI default / HMC | MLE (statsmodels) |
 | Dependencies | R, bsts | numpy, pandas, matplotlib | TF, TFP (3 GB+) | TF, TFP (3 GB+) | statsmodels |
 | Spike-and-slab | Yes | Yes | Unknown | No | No |
+| Horseshoe prior | No | Yes (`prior_type='horseshoe'`) | No | No | No |
 | Seasonal component | Yes | Yes (`nseasons`, `season_duration`) | Unknown | Yes (TFP STS) | No |
 | Dynamic regression | Yes | Yes (`dynamic_regression=True`) | Unknown | No | No |
 | R numerical test | Reference | ±1% CI-enforced + TOST/ROPE | Not published | Visual comparison (~8% diff) | Not tested |
@@ -205,6 +206,7 @@ Evidence per implementation (all verified from source code, not documentation cl
 | DATE decomposition | Extended | Decomposes effects into spot/persistent/trend (arXiv:2602.00836) |
 | Retrospective mode | Extended | Treatment indicators as covariates; effects from beta posteriors (arXiv:2602.00836) |
 | Placebo test | Extended | Null distribution from pre-period splits |
+| Horseshoe prior | Extended | Continuous shrinkage alternative to spike-and-slab (Kohns & Bhattacharjee 2022) |
 | Conformal inference | Extended | Distribution-free prediction intervals |
 | DTW control selection | Extended | Automatic covariate selection via Dynamic Time Warping |
 
@@ -223,6 +225,7 @@ Features that go beyond R's CausalImpact. These have no R equivalent.
 | Placebo test | `ci.run_placebo_test()` | Validates effect against null distribution from pre-period splits | |
 | Conformal inference | `ci.run_conformal_analysis()` | Distribution-free prediction intervals | Vovk et al. (2005) |
 | DTW control selection | `select_controls()` | Automatic covariate selection via Dynamic Time Warping | Sakoe & Chiba (1978) |
+| Horseshoe prior | `ModelOptions(prior_type='horseshoe')` | Continuous shrinkage alternative to spike-and-slab for dense DGP | Kohns & Bhattacharjee (2022), arXiv:2011.00938 |
 
 ## API
 
@@ -251,6 +254,7 @@ Features that go beyond R's CausalImpact. These have no R equivalent.
 | `season_duration` | `None` | Optional duration of each seasonal block; defaults to `1` when `nseasons` is set |
 | `dynamic_regression` | `False` | Enable time-varying regression coefficients (random-walk beta) |
 | `state_model` | `"local_level"` | `"local_level"` or `"local_linear_trend"` |
+| `prior_type` | `"spike_slab"` | `"spike_slab"` or `"horseshoe"` (continuous shrinkage for dense DGP) |
 | `mode` | `"forward"` | `"forward"` (counterfactual prediction) or `"retrospective"` (treatment indicators as covariates) |
 
 #### Methods and Properties
@@ -262,7 +266,8 @@ Features that go beyond R's CausalImpact. These have no R equivalent.
 | `plot(metrics=None)` | `Figure` | Matplotlib figure with original/pointwise/cumulative panels |
 | `inferences` | `DataFrame` | Per-timestep actuals, predictions, prediction s.d., and effect intervals |
 | `summary_stats` | `dict` | Aggregate statistics (effect mean, CI, p-value, etc.) |
-| `posterior_inclusion_probs` | `ndarray \| None` | Posterior inclusion probability per covariate |
+| `posterior_inclusion_probs` | `ndarray \| None` | Posterior inclusion probability per covariate (spike-and-slab only) |
+| `posterior_shrinkage` | `ndarray \| None` | Mean shrinkage factor per covariate (horseshoe only) |
 | `decompose(alpha=None)` | `DateDecomposition` | DATE decomposition into spot/persistent/trend components |
 | `run_placebo_test(...)` | `PlaceboTestResults` | Placebo test for effect validation |
 | `run_conformal_analysis(...)` | `ConformalResults` | Distribution-free conformal prediction intervals |
 
@@ -35,7 +35,8 @@ ci = CausalImpact(data, pre_period, post_period, model_args=None, alpha=0.05)
 |---|---|---|
 | `inferences` | `DataFrame` | Per-timestep actuals, predictions, prediction s.d., and effect intervals |
 | `summary_stats` | `dict` | Aggregate statistics (effect mean, CI, p-value, etc.) |
-| `posterior_inclusion_probs` | `ndarray \| None` | Posterior inclusion probability per covariate (requires covariates) |
+| `posterior_inclusion_probs` | `ndarray \| None` | Posterior inclusion probability per covariate (spike-and-slab only; returns `None` for horseshoe) |
+| `posterior_shrinkage` | `ndarray \| None` | Mean shrinkage factor kappa_j per covariate (horseshoe only; returns `None` for spike-and-slab). Values near 0 = weakly shrunk (included), near 1 = strongly shrunk. |
 
 ## `ModelOptions`
 
@@ -58,11 +59,24 @@ ci = CausalImpact(data, pre_period, post_period, model_args=opts)
 | `standardize_data` | `bool` | `True` | Standardize data before fitting |
 | `expected_model_size` | `int` | 2 | Expected number of active covariates for spike-and-slab prior |
 | `dynamic_regression` | `bool` | `False` | Enable time-varying regression coefficients |
+| `prior_type` | `str` | `"spike_slab"` | `"spike_slab"` (discrete variable selection) or `"horseshoe"` (continuous shrinkage). Horseshoe is recommended for dense DGP settings. |
 | `state_model` | `str` | `"local_level"` | `"local_level"` or `"local_linear_trend"` |
-| `mode` | `str` | `"forward"` | `"forward"` (counterfactual prediction) or `"retrospective"` (treatment indicators as covariates). Retrospective mode adds spot/persistent/trend columns to X and fits on the entire series. Effects are extracted from beta posteriors. |
 | `nseasons` | `int \| None` | `None` | Seasonal cycle count. `nseasons=1` is equivalent to no seasonal component. |
 | `season_duration` | `int \| None` | `None` | Duration of each seasonal block; defaults to 1 when `nseasons` is set. Requires `nseasons` to be set. |
 
+### Analysis Mode
+
+`mode` controls forward vs retrospective analysis. Pass via `model_args` dict (not `ModelOptions`).
+
+| Value | Description |
+|---|---|
+| `"forward"` (default) | Counterfactual prediction: fit on pre-period, predict post-period |
+| `"retrospective"` | Treatment indicators as covariates: fit on entire series |
+
+```python
+ci = CausalImpact(data, pre, post, model_args={"mode": "retrospective"})
+```
+
 ## `CausalImpactResults`
 
 Returned by `ci._results`. A frozen dataclass containing all computed quantities.
@@ -86,6 +100,54 @@ Returned by `ci._results`. A frozen dataclass containing all computed quantities
 | `predictions_lower` | `ndarray` | Lower CI on counterfactual |
 | `predictions_upper` | `ndarray` | Upper CI on counterfactual |
 
+## Horseshoe Prior (alternative to spike-and-slab)
+
+CausalImpact supports the horseshoe prior (Carvalho, Polson & Scott 2010)
+applied to BSTS regression, following the formulation of
+Kohns & Bhattacharjee (2022) (arXiv:2011.00938).
+
+### When to use horseshoe
+
+| Scenario | Recommended prior |
+|---|---|
+| Few true covariates (sparse DGP) | `spike_slab` (default) |
+| Many true covariates (dense DGP) | `horseshoe` |
+
+### Usage
+
+```python
+from causal_impact import CausalImpact, ModelOptions
+
+ci = CausalImpact(
+    data, pre_period, post_period,
+    model_args=ModelOptions(prior_type='horseshoe'),
+)
+print(ci.posterior_shrinkage)   # mean(kappa_j), 0=included 1=shrunk
+# ci.posterior_inclusion_probs is None for horseshoe (spike-slab only)
+```
+
+### Shrinkage diagnostics
+
+| Property | prior_type | Meaning |
+|---|---|---|
+| `posterior_inclusion_probs` | `spike_slab` | E[gamma_j] — discrete inclusion probability |
+| `posterior_inclusion_probs` | `horseshoe` | `None` (not applicable) |
+| `posterior_shrinkage` | `horseshoe` | E[kappa_j] — continuous shrinkage factor kappa_j = 1/(1+lambda_j^2 * tau^2). Values close to 0 indicate the covariate is weakly shrunk (effectively included). |
+| `posterior_shrinkage` | `spike_slab` | `None` (not applicable) |
+
+### Incompatible combinations
+
+- `prior_type='horseshoe'` + `dynamic_regression=True` raises `ValueError`
+- `prior_type='horseshoe'` + `mode='retrospective'` raises `ValueError`
+
+### References
+
+- Kohns, D. & Bhattacharjee, A. (2022). Horseshoe Prior for Sparse Bayesian Structural Time Series. arXiv:2011.00938.
+- Makalic, E. & Schmidt, D.F. (2015). A simple sampler for the horseshoe estimator. IEEE Signal Processing Letters, 23(1), 179-182.
+- Carvalho, C.M., Polson, N.G. & Scott, J.G. (2010). The horseshoe estimator for sparse signals. Biometrika, 97(2), 465-480.
+
+---
+
 ## Beyond R Extensions
 
 ### Retrospective Mode
 
@@ -10,6 +10,7 @@ Five additional capabilities extend the analysis beyond what R offers.
 | Placebo test | `ci.run_placebo_test()` | Validate effects against null distribution |
 | Conformal inference | `ci.run_conformal_analysis()` | Distribution-free prediction intervals |
 | DTW control selection | `select_controls()` | Automatic covariate selection |
+| Horseshoe prior | `ModelOptions(prior_type='horseshoe')` | Continuous shrinkage for dense DGP |
 
 ---
 
@@ -245,6 +246,166 @@ for spoken word recognition."
 
 ---
 
+## Horseshoe Prior
+
+### What it does
+
+The horseshoe prior (Carvalho, Polson & Scott 2010) is a continuous shrinkage
+alternative to spike-and-slab variable selection. While spike-and-slab performs
+discrete inclusion/exclusion of covariates (gamma_j in {0,1}), the horseshoe
+applies adaptive shrinkage that can handle dense DGP settings where many
+covariates have true effects.
+
+Reference: Kohns & Bhattacharjee (2022), "Horseshoe Prior for Sparse Bayesian
+Structural Time Series" (arXiv:2011.00938).
+
+### Hierarchical model
+
+The horseshoe hierarchy uses Half-Cauchy priors decomposed into InvGamma
+auxiliary variables (Makalic & Schmidt 2015):
+
+```
+beta_j | lambda_j, tau, sigma2  ~ N(0, lambda_j^2 * tau^2 * sigma2_obs)
+lambda_j^2 | nu_j               ~ InvGamma(1/2, 1/nu_j)
+nu_j                             ~ InvGamma(1/2, 1)
+tau^2 | xi                       ~ InvGamma(1/2, 1/xi)
+xi                               ~ InvGamma(1/2, 1)
+```
+
+The conditional posteriors used in the Gibbs sampler:
+
+```
+lambda_j^2 | .  ~ InvGamma(1,       1/nu_j + beta_j^2 / (2 * tau^2 * sigma2))
+nu_j       | .  ~ InvGamma(1,       1 + 1/lambda_j^2)
+tau^2      | .  ~ InvGamma((k+1)/2, 1/xi + sum(beta_j^2 / (2 * lambda_j^2 * sigma2)))
+xi         | .  ~ InvGamma(1,       1 + 1/tau^2)
+```
+
+### Beta joint update
+
+Unlike spike-and-slab (coordinate-wise), horseshoe uses a joint beta update:
+
+```
+A = X'X + diag(1 / (lambda_j^2 * tau^2))    (precision matrix)
+b = X'(y - state - seasonal)                 (right-hand side)
+beta ~ N(A^{-1} b, sigma2_obs * A^{-1})     (sampled via Cholesky)
+```
+
+### Shrinkage factor
+
+The shrinkage factor kappa_j measures how much each covariate is shrunk:
+
+```
+kappa_j = 1 / (1 + lambda_j^2 * tau^2)
+```
+
+- kappa_j close to 1: strong shrinkage (covariate effectively excluded)
+- kappa_j close to 0: weak shrinkage (covariate effectively included)
+
+The `posterior_shrinkage` property returns E[kappa_j] averaged over post-warmup
+MCMC iterations.
+
+### When to use
+
+| Scenario | Recommended prior |
+|---|---|
+| Few true covariates among many candidates (sparse DGP) | `spike_slab` (default) |
+| Many covariates with true effects (dense DGP) | `horseshoe` |
+| Time-varying coefficients | `spike_slab` (horseshoe + dynamic_regression not supported) |
+
+### Usage
+
+```python
+from causal_impact import CausalImpact, ModelOptions
+
+ci = CausalImpact(
+    data, pre_period, post_period,
+    model_args=ModelOptions(prior_type='horseshoe', niter=2000, seed=42),
+)
+
+# Shrinkage diagnostics
+print(ci.posterior_shrinkage)   # E[kappa_j] per covariate
+# posterior_inclusion_probs is None for horseshoe
+```
+
+### Implementation decisions not specified in the papers
+
+The following design choices are not prescribed by the reference papers.
+Each choice is documented here with its rationale so that reviewers can
+evaluate them independently.
+
+#### tau0 initialization
+
+The global shrinkage parameter tau^2 requires an initial value for the
+Gibbs sampler.  None of the three reference papers specify a concrete
+formula.  This implementation uses a data-adaptive heuristic:
+
+```
+y_norm = ||y_pre||_2 / sqrt(T_pre)
+tau0   = y_sd / (sqrt(k) * y_norm)
+tau^2_init = tau0^2
+```
+
+Rationale: after standardization y_sd is approximately 1.  Dividing by
+sqrt(k) prevents the global scale from growing with the number of
+covariates.  Dividing by y_norm anchors the prior scale to the signal
+magnitude.  Because tau^2 is resampled at every Gibbs iteration, the
+chain forgets the initial value within the warmup period.  If y_norm
+is near zero (constant y), tau0 falls back to 1.0 so that the prior
+remains diffuse.
+
+#### Numerical clamping on derived precision (not on raw draws)
+
+Raw InvGamma draws for lambda_j^2 and tau^2 receive no floor.  Clamping
+raw draws would distort the posterior distribution.  Instead, the derived
+precision diagonal entry is clamped:
+
+```
+lambda_tau_prod = max(lambda_j^2 * tau^2, 1e-30)   -- prevents 0-division
+prior_prec      = min(1 / lambda_tau_prod, 1e12)    -- prevents inf diagonal
+```
+
+This approach keeps the posterior intact while protecting the Cholesky
+decomposition from numerical failure.  The kappa() diagnostic uses the
+same 1e-30 floor so that shrinkage values stay consistent with the
+precision matrix actually used in the beta update.
+
+The fallback in sample_inv_gamma (returning 1e-30 for non-finite
+parameters) serves as a last-resort guard.  It triggers only under
+extreme-scale inputs (e.g. standardize_data=False with y of order 1e200)
+where the scale parameter overflows to infinity.
+
+#### Gibbs sampling order
+
+Horseshoe and spike-and-slab use different orderings within the Gibbs loop:
+
+```
+Horseshoe:   state -> beta (joint)   -> lambda2/nu -> tau2/xi -> sigma2_obs
+Spike-slab:  state -> sigma2_obs     -> beta (coordinate-wise)
+```
+
+Horseshoe samples beta jointly via a precision matrix conditioned on the
+current sigma2_obs.  After the joint beta update, sigma2_obs is resampled
+conditioned on the updated residual.  This follows Algorithm 1 of Makalic
+& Schmidt (2015) where the regression step precedes the variance update.
+
+Spike-and-slab samples sigma2_obs first because its coordinate-wise
+variable selection (gamma_j) is sensitive to cold-start: sampling beta
+with an uninformative sigma2_obs on the first iteration can cause the
+sampler to exclude all covariates.  Sampling sigma2_obs first gives beta
+a reasonable scale to condition on.
+
+### References
+
+- Carvalho, C.M., Polson, N.G. & Scott, J.G. (2010). The horseshoe estimator
+  for sparse signals. Biometrika, 97(2), 465-480.
+- Kohns, D. & Bhattacharjee, A. (2022). Horseshoe Prior for Sparse Bayesian
+  Structural Time Series. arXiv:2011.00938.
+- Makalic, E. & Schmidt, D.F. (2015). A simple sampler for the horseshoe
+  estimator. IEEE Signal Processing Letters, 23(1), 179-182.
+
+---
+
 ## Citation
 
 ```bibtex