Data included in package

This package contains several example datasets that can be used to explore its functionality. The datasets and simulation scripts used to generate them are contained in their own dedicated branch, and described in detail below.

Once the data have been simulated, it is still incomplete - we must then estimate the individual treatment effects (ITEs) that are used to group observations. The description below also contains information on how the ITEs are estimated and subgroups determined.

Simple simulated data

The table below defines the possible underlying correlation structures for the covariates in the simple simulated datasets.

Y ~ N(mu, 1) is the continuous outcome, with the mean given in the table below.
trt ~ Bern(p) is the binary treatment, with the mean given in the table below.
X5 ~ N(0,1) is a covariate associated with the treatment only (i.e., an instrument).
X6 ~ N(0,1) is a covariate associate with the outcome only (i.e., a prognostic variable).
X1, X2, X3, X4 ~i.i.d. N(0,1) are confounders of the effect of the treatment on the outcome.
E1, E2, and E3 ~i.i.d. Bern(0.5) are binary effect modifiers that define eight subgroups within the data.

Simple Simulation Scenarios

A dataset of size 1500 has been generated under each of the simulation scenarios, and is included upon installation of hetviz. Users are able to generate their own data under this general covariate structure using the function datagen() defined in simpleData.R. This function allows the user to generate a new sample under the population parameters given in the figure above, or specify their own as vectors:

## PSEUDOCODE

# coefficients for the treatment mean 
#   should be given in the following order, in a
#   vector of length 9
alpha <- c(intercept, X1, X2, X3, X4, X5, E1, E2, E3)
# the data were generated using
alpha <- c(0, 0.1, -0.1, 1.1, -1.1, 0.4, -0.1, 1.1, -4)
 
# coefficients for the outcome mean 
#   should be given in the following order, in a
#   vector of length 13
beta <- c(intercept, trt, X1, X2, X3, X4, X6, E1, E2, E3, TE1, TE2, TE3)
# the data were generated using
beta <- c(-3.85, 5, 0.5, -2, -0.5, 2, 1, -1, 0, -2, 1, 4, -4)

Confounding and no effect modification (A)

Contained in simpleDataA.csv and generated by calling

datagen(n = 1500, effMod = FALSE, confound = TRUE, confoundEMs = FALSE)

Effect modification and no confounding (B)

Contained in simpleDataB.csv and generated by calling

datagen(n = 1500, effMod = TRUE, confound = FALSE, confoundEMs = FALSE)

Effect modification and confounding (C)

Contained in simpleDataC.csv and generated by calling

datagen(n = 1500, effMod = TRUE, confound = TRUE, confoundEMs = FALSE)

Effect modification and confounding, with additional confounding by effect modifiers (D)

Contained in simpleDataD.csv and generated by calling

datagen(n = 1500, effMod = TRUE, confound = TRUE, confoundEMs = TRUE)

Estimation of ITEs and subgroups

BayesTree::bart() is used to estimate ITEs. After estimating the ITEs, their quantiles are used to partition the data into subgroups.

An example of how to do is is provided in simpleData.R.

More complex data

Medicare data exploring the effect of stent type on revacularization rate

📬 Suggestions? Create an issue.

📧 General comments? Contact the author.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data included in package

Contents

Simple simulated data

Confounding and no effect modification (A)

Effect modification and no confounding (B)

Effect modification and confounding (C)

Effect modification and confounding, with additional confounding by effect modifiers (D)

Estimation of ITEs and subgroups

More complex data

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Home

Features

Data

Clone this wiki locally