causatr/README.Rmd at main · etverse/causatr · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# causatr

<!-- badges: start -->
[![R-CMD-check](https://github.com/etverse/causatr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/etverse/causatr/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/etverse/causatr/graph/badge.svg)](https://app.codecov.io/gh/etverse/causatr)
<!-- badges: end -->

**causatr** provides a unified interface for causal effect estimation via five
complementary methods: g-computation (parametric g-formula + ICE), inverse
probability weighting (IPW with a self-contained density-ratio engine),
augmented IPW (AIPW — doubly robust), structural nested mean models (SNM —
g-estimation for time-varying effect modification), and propensity score
matching (via [MatchIt](https://kosukeimai.github.io/MatchIt/)).
When multiple methods agree, you can be more confident in your findings — this
is called **methodological triangulation**.

The package implements the methods described in Hernán & Robins (2025)
*Causal Inference: What If* with a simple two-step API:

1. **Fit** the causal model with `causat()`
2. **Contrast** interventions with `contrast()`

## Installation

Install the development version from GitHub:

```r
# install.packages("pak")
pak::pak("etverse/causatr")
```

## Quick example

Estimate the average causal effect of quitting smoking on weight gain
using the NHEFS dataset from Hernán & Robins (2025):

```{r example}
library(causatr)
data("nhefs")

# Step 1: Fit the outcome model via g-computation
fit <- causat(
  nhefs,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders = ~ sex + age + I(age^2) + race + factor(education) +
    smokeintensity + I(smokeintensity^2) + smokeyrs + I(smokeyrs^2) +
    factor(exercise) + factor(active) + wt71 + I(wt71^2) +
    qsmk:smokeintensity,
  censoring = "censored"
)

# Step 2: Contrast interventions
result <- contrast(
  fit,
  interventions = list(quit = static(1), continue = static(0)),
  reference = "continue"
)
result
```

## Methodological triangulation

Compare g-computation, IPW, AIPW, and matching on the same data:

```{r triangulation}
conf <- ~ sex + age + race + smokeintensity + smokeyrs +
  factor(exercise) + factor(active) + wt71

# G-computation (outcome model)
fit_gc <- causat(nhefs, outcome = "wt82_71", treatment = "qsmk",
  confounders = conf, censoring = "censored")

# IPW (treatment model)
fit_ipw <- causat(nhefs, outcome = "wt82_71", treatment = "qsmk",
  confounders = conf, estimator = "ipw")

# AIPW (doubly robust)
fit_aipw <- causat(nhefs, outcome = "wt82_71", treatment = "qsmk",
  confounders = conf, estimator = "aipw", censoring = "censored")

# Matching (propensity score)
fit_m <- causat(nhefs, outcome = "wt82_71", treatment = "qsmk",
  confounders = conf, estimator = "matching", estimand = "ATT")

# All four estimates
intv <- list(quit = static(1), cont = static(0))
rbind(
  data.frame(estimator = "gcomp", contrast(fit_gc,
    intv, reference = "cont")$contrasts),
  data.frame(estimator = "ipw", contrast(fit_ipw,
    intv, reference = "cont")$contrasts),
  data.frame(estimator = "aipw", contrast(fit_aipw,
    intv, reference = "cont")$contrasts),
  data.frame(estimator = "matching", contrast(fit_m,
    intv, reference = "cont")$contrasts)
)
```

## Intervention types

Beyond static interventions, causatr supports modified treatment policies (MTPs)
and stochastic interventions:

```{r interventions}
fit_cont <- causat(nhefs, outcome = "wt82_71",
  treatment = "smokeintensity",
  confounders = ~ sex + age + race + wt71,
  censoring = "censored")

contrast(fit_cont,
  interventions = list(
    reduce10 = shift(-10),
    halved = scale_by(0.5),
    cap20 = threshold(0, 20),
    observed = NULL
  ),
  reference = "observed"
)
```

## Diagnostics

Check covariate balance and positivity after fitting:

```{r diagnostics, eval = FALSE}
diag <- diagnose(fit_ipw)
diag          # positivity + balance summary
plot(diag)    # Love plot (requires cobalt)
```

## Features

- **Five estimation methods**: g-computation (parametric g-formula),
  IPW (self-contained density-ratio engine — no runtime dependency on
  WeightIt), AIPW (doubly robust — consistent if either outcome or
  treatment model is correct), SNM (structural nested mean models —
  g-estimation for time-varying effect modification via blip
  parameters), and matching (via
  [MatchIt](https://kosukeimai.github.io/MatchIt/)). Matching is
  binary-only; continuous, categorical, count, and multivariate
  treatments use g-comp, IPW, AIPW, or SNM.
- **Longitudinal support**: ICE g-computation (Zivich et al. 2024),
  longitudinal IPW, longitudinal AIPW, and longitudinal SNM
  (backward-sequential g-estimation) for time-varying treatments.
  Sandwich variance via stacked estimating equations, plus parallel
  bootstrap via `boot::boot()` (with optional
  [future](https://future.futureverse.org/) backend).
- **Flexible interventions**: `static()`, `shift()`, `scale_by()`,
  `threshold()`, `dynamic()`, `ipsi()` (incremental propensity score),
  and `stochastic()` (user-defined randomised rules with Monte Carlo
  integration). Which interventions are available depends on the
  estimator — see the
  [interventions vignette](https://etverse.github.io/causatr/articles/interventions.html).
- **Treatment types**: binary, continuous, categorical (k > 2), count
  (Poisson / negative binomial propensity via `propensity_family =`),
  and multivariate (joint) treatments. Multivariate IPW uses sequential
  MTP factorisation (Díaz et al. 2023) with optional stabilised weights
  (`stabilize = "marginal"`).
- **Any outcome family**: gaussian, binomial (logit / probit /
  cloglog), Poisson, quasibinomial (fractional), Gamma, negative
  binomial (`MASS::glm.nb`), beta regression (`betareg::betareg`), plus
  any family you pass through `model_fn`.
- **Pluggable models**: `stats::glm`, `mgcv::gam`, splines via `ns()`
  / `bs()`, or any fit function with signature `(formula, data, family,
  weights, ...)`. A two-tier numeric-variance fallback handles model
  classes without a `sandwich::estfun` method.
- **Robust inference**: analytic sandwich SE (default, via a unified
  influence-function engine) or nonparametric bootstrap with percentile
  CIs. Cluster-robust sandwich via `cluster =`; survey designs
  (`survey::svydesign`) auto-extract weights and PSU.
- **Built-in IPCW**: for MAR outcome censoring, `ipcw = TRUE` fits an
  internal censoring model and computes stabilised IPCW weights —
  provides doubly-robust protection under g-comp and is essential for
  IPW under MAR censoring. Custom censoring models via
  `censoring_model_fn =`.
- **Contrast types**: risk difference, risk ratio, odds ratio — ratio
  and OR use log-scale CIs.
- **Estimands**: ATE, ATT, ATC, or custom subgroups via `subset =` /
  `by =`.
- **Effect modification**: `by =` in `contrast()` for subgroup-specific
  effects. Under IPW and matching the modifier must be a baseline
  variable.
- **Transportability / generalizability**: transport causal estimates
  from a study sample to a target population with `target =`. causatr
  fits a sampling model P(S=1|L) and reweights (gcomp, IPW) or
  augments (AIPW) the estimator to recover the target-population
  estimand. Diagnostics include sampling-score overlap and weight
  summaries.
- **Built-in diagnostics**: positivity checks, covariate balance via
  [cobalt](https://ngreifer.github.io/cobalt/), weight summaries,
  censoring model diagnostics, sampling model diagnostics, Love plots.
- **Tidy integration**: `tidy()` / `glance()` / `confint()` / `coef()`
  / `vcov()` / `plot()` (forest plot via
  [forrest](https://github.com/etverse/forrest)) / broom-compatible
  output.

## References

Hernán MA, Robins JM (2025). *Causal Inference: What If*. Chapman & Hall/CRC.

## Acknowledgements

This package was built with the contribution of [Claude](https://claude.ai),
Anthropic's AI assistant.