|
| 1 | +# bsts-causalimpact |
| 2 | + |
| 3 | +Bayesian structural time series for causal inference in Python. |
| 4 | +A faithful port of Google's [CausalImpact](https://google.github.io/CausalImpact/) R package, with the Gibbs sampler implemented in Rust via PyO3. |
| 5 | + |
| 6 | +No TensorFlow required. 10-30x faster than R. |
| 7 | + |
| 8 | +## Installation |
| 9 | + |
| 10 | +```bash |
| 11 | +pip install bsts-causalimpact |
| 12 | +``` |
| 13 | + |
| 14 | +Requires Python 3.10+ and a Rust toolchain (only for building from source). |
| 15 | + |
| 16 | +## Example: Measuring the Effect of an Intervention |
| 17 | + |
| 18 | +This walkthrough mirrors the [R CausalImpact tutorial](https://google.github.io/CausalImpact/CausalImpact.html). |
| 19 | + |
| 20 | +### 1. Create Example Data |
| 21 | + |
| 22 | +Construct a synthetic dataset: a response variable `y` driven by a covariate `x`, |
| 23 | +with a known intervention effect of +5 units injected after time point 100. |
| 24 | + |
| 25 | +```python |
| 26 | +import numpy as np |
| 27 | +import pandas as pd |
| 28 | +from causal_impact import CausalImpact |
| 29 | + |
| 30 | +rng = np.random.default_rng(42) |
| 31 | + |
| 32 | +n_pre, n_post = 100, 30 |
| 33 | +n = n_pre + n_post |
| 34 | + |
| 35 | +x = rng.normal(0, 1, size=n).cumsum() + 100 |
| 36 | +y = 1.2 * x + rng.normal(0, 1, size=n) |
| 37 | +y[n_pre:] += 5.0 # inject intervention effect |
| 38 | + |
| 39 | +dates = pd.date_range("2020-01-01", periods=n, freq="D") |
| 40 | +data = pd.DataFrame({"y": y, "x": x}, index=dates) |
| 41 | + |
| 42 | +pre_period = ["2020-01-01", "2020-04-09"] |
| 43 | +post_period = ["2020-04-10", "2020-05-09"] |
| 44 | +``` |
| 45 | + |
| 46 | +The first column (`y`) is the response variable. Remaining columns are covariates |
| 47 | +that the model uses to build a counterfactual prediction. |
| 48 | + |
| 49 | +### 2. Run the Analysis |
| 50 | + |
| 51 | +```python |
| 52 | +ci = CausalImpact(data, pre_period, post_period, model_args={"seed": 42}) |
| 53 | +``` |
| 54 | + |
| 55 | +`CausalImpact` fits a Bayesian structural time series model on the pre-intervention |
| 56 | +data, then generates counterfactual predictions for the post-intervention period. |
| 57 | + |
| 58 | +### 3. Visualize the Results |
| 59 | + |
| 60 | +```python |
| 61 | +fig = ci.plot() |
| 62 | +fig.savefig("causal_impact_plot.png", dpi=150, bbox_inches="tight") |
| 63 | +``` |
| 64 | + |
| 65 | + |
| 66 | + |
| 67 | +The plot has three panels: |
| 68 | + |
| 69 | +- Top panel: observed data (solid) vs. counterfactual prediction (dashed) with 95% credible interval |
| 70 | +- Middle panel: pointwise causal effect (observed minus predicted) |
| 71 | +- Bottom panel: cumulative causal effect over the post-intervention period |
| 72 | + |
| 73 | +### 4. Summary Statistics |
| 74 | + |
| 75 | +```python |
| 76 | +print(ci.summary()) |
| 77 | +``` |
| 78 | + |
| 79 | +``` |
| 80 | +Posterior inference {CausalImpact} |
| 81 | +
|
| 82 | + Average Cumulative |
| 83 | +Actual 117.11 3513.26 |
| 84 | +Prediction (s.d.) 112.66 (0.49) 3379.75 (14.79) |
| 85 | +95% CI [111.63, 113.61] [3348.81, 3408.33] |
| 86 | +
|
| 87 | +Absolute effect (s.d.) 4.45 (0.49) 133.51 (14.79) |
| 88 | +95% CI [3.50, 5.48] [104.93, 164.45] |
| 89 | +
|
| 90 | +Relative effect (s.d.) 3.95% (0.46%) 3.95% (0.46%) |
| 91 | +95% CI [3.08%, 4.91%] [3.08%, 4.91%] |
| 92 | +
|
| 93 | +Posterior tail-area probability p: 0.001 |
| 94 | +Posterior prob. of a causal effect: 99.90% |
| 95 | +``` |
| 96 | + |
| 97 | +The summary table shows the average and cumulative causal effect, along with |
| 98 | +credible intervals and a Bayesian p-value. |
| 99 | + |
| 100 | +### 5. Narrative Report |
| 101 | + |
| 102 | +```python |
| 103 | +print(ci.report()) |
| 104 | +``` |
| 105 | + |
| 106 | +``` |
| 107 | +Analysis report {CausalImpact} |
| 108 | +
|
| 109 | +During the post-intervention period, the response variable showed a increase |
| 110 | +compared to what would have been expected without the intervention. |
| 111 | +
|
| 112 | +The average causal effect was 4.45 (95% CI [3.50, 5.48]). |
| 113 | +
|
| 114 | +The cumulative effect over the entire post-period was 133.51. |
| 115 | +
|
| 116 | +The relative effect was 4.0%. |
| 117 | +
|
| 118 | +This effect is statistically significant (p = 0.0010). The probability of |
| 119 | +obtaining an effect of this magnitude by chance is very small. Hence, the |
| 120 | +causal effect can be considered statistically significant. |
| 121 | +``` |
| 122 | + |
| 123 | +### 6. Access Raw Inferences |
| 124 | + |
| 125 | +```python |
| 126 | +# Per-timestep effects, predictions, and credible intervals |
| 127 | +df = ci.inferences |
| 128 | +print(df.head()) |
| 129 | + |
| 130 | +# Aggregate statistics as a dict |
| 131 | +stats = ci.summary_stats |
| 132 | +print(stats["point_effect_mean"]) |
| 133 | +print(stats["p_value"]) |
| 134 | +``` |
| 135 | + |
| 136 | +## Working with Covariates |
| 137 | + |
| 138 | +The model treats the first column as the response and all remaining columns |
| 139 | +as covariates. Covariates must not be affected by the intervention. |
| 140 | + |
| 141 | +```python |
| 142 | +data = pd.DataFrame({ |
| 143 | + "y": response, |
| 144 | + "x1": covariate_1, |
| 145 | + "x2": covariate_2, |
| 146 | +}, index=dates) |
| 147 | + |
| 148 | +ci = CausalImpact(data, pre_period, post_period) |
| 149 | +``` |
| 150 | + |
| 151 | +When covariates are present, the model uses spike-and-slab variable selection |
| 152 | +to determine which covariates are informative. Check posterior inclusion |
| 153 | +probabilities: |
| 154 | + |
| 155 | +```python |
| 156 | +print(ci.posterior_inclusion_probs) |
| 157 | +# array([0.98, 0.12]) — x1 is strongly included, x2 is not |
| 158 | +``` |
| 159 | + |
| 160 | +## Model Parameters |
| 161 | + |
| 162 | +| Parameter | Default | Description | |
| 163 | +|---|---|---| |
| 164 | +| `niter` | 1000 | Total MCMC iterations | |
| 165 | +| `nwarmup` | 500 | Burn-in iterations to discard | |
| 166 | +| `nchains` | 1 | Number of MCMC chains | |
| 167 | +| `seed` | 0 | Random seed for reproducibility | |
| 168 | +| `prior_level_sd` | 0.01 | Prior standard deviation for the local level | |
| 169 | +| `standardize_data` | `True` | Standardize data before fitting | |
| 170 | +| `expected_model_size` | 2 | Expected number of active covariates (spike-and-slab prior) | |
| 171 | +| `nseasons` | `None` | Seasonal cycle count | |
| 172 | +| `season_duration` | `None` | Duration of each seasonal block (defaults to 1 when `nseasons` is set) | |
| 173 | + |
| 174 | +Pass parameters via `model_args`: |
| 175 | + |
| 176 | +```python |
| 177 | +ci = CausalImpact( |
| 178 | + data, pre_period, post_period, |
| 179 | + model_args={"niter": 5000, "seed": 123, "prior_level_sd": 0.05} |
| 180 | +) |
| 181 | +``` |
| 182 | + |
| 183 | +## When This Method Is Valid |
| 184 | + |
| 185 | +This method produces reliable estimates only when all of the following hold: |
| 186 | + |
| 187 | +- Control series are not contaminated by the intervention |
| 188 | +- The relationship between treated and control series is stable across the pre- and post-intervention periods |
| 189 | +- The pre-intervention period is sufficiently long (rule of thumb: at least 3x the post-intervention period) |
| 190 | + |
| 191 | +If any of these assumptions are violated, consider a difference-in-differences or |
| 192 | +synthetic control approach instead. |
0 commit comments