Skip to content

Commit de64cf6

Browse files
authored
version 0.6.0 (#7)
* version 0.6.0 * `datasets` module with toy datasets for causal analysis * `contrib` module for new state-of-the-art outside contributions * New implementation for MarginalOutcomeEstimator (formerly UncorrectedEstimator) using WeightEstimator API * Additional Jupyter Notebook examples * Additional bug fix and documentation
1 parent e5ec1af commit de64cf6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+59294
-397
lines changed

.travis.yml

+3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
language: python
22
python:
33
- "3.6"
4+
- "3.7"
5+
- "3.8"
46
cache: pip
57
before_script:
68
- curl -L https://codeclimate.com/downloads/test-reporter/test-reporter-latest-linux-amd64 > ./cc-test-reporter
@@ -12,6 +14,7 @@ install:
1214
script:
1315
- pip install -e . # test that install is running properly
1416
- pip freeze
17+
- pytest causallib/contrib/tests
1518
- pytest --cov-report= --cov=causallib causallib/tests
1619
after_success:
1720
- coverage xml

README.md

+78-15
Original file line numberDiff line numberDiff line change
@@ -2,36 +2,99 @@
22
[![Test Coverage](https://api.codeclimate.com/v1/badges/db2562e44c4a9f7280dc/test_coverage)](https://codeclimate.com/github/IBM/causallib/test_coverage)
33
[![PyPI version](https://badge.fury.io/py/causallib.svg)](https://badge.fury.io/py/causallib)
44
[![Documentation Status](https://readthedocs.org/projects/causallib/badge/?version=latest)](https://causallib.readthedocs.io/en/latest/)
5-
# IBM Causal Inference Library
6-
A Python package for computational inference of causal effect.
5+
# Causal Inference 360
6+
A Python package for inferring causal effects from observational data.
77

88
## Description
9-
Causal inference analysis allows estimating of the effect of intervention
10-
on some outcome from observational data.
11-
It deals with the selection bias that is inherent to such data.
9+
Causal inference analysis enables estimating the causal effect of
10+
an intervention on some outcome from real-world non-experimental observational data.
1211

13-
This python package allows creating modular causal inference models
14-
that internally utilize machine learning models of choice,
15-
and can estimate either individual or average outcome given an intervention.
16-
The package also provides the means to evaluate the performance of the
17-
machine learning models and their predictions.
12+
This package provides a suite of causal methods,
13+
under a unified scikit-learn-inspired API.
14+
It implements meta-algorithms that allow plugging in arbitrarily complex machine learning models.
15+
This modular approach supports highly-flexible causal modelling.
16+
The fit-and-predict-like API makes it possible to train on one set of examples
17+
and estimate an effect on the other (out-of-bag),
18+
which allows for a more "honest"<sup>1</sup> effect estimation.
19+
20+
The package also includes an evaluation suite.
21+
Since most causal-models utilize machine learning models internally,
22+
we can diagnose poor-performing models by re-interpreting known ML evaluations from a causal perspective.
23+
See [arXiv:1906.00442](https://arxiv.org/abs/1906.00442) for more details on how.
24+
25+
26+
-------------
27+
<sup>1</sup> Borrowing [Wager & Athey](https://arxiv.org/abs/1510.04342) terminology of avoiding overfit.
1828

19-
The machine learning models must comply with scikit-learn's api
20-
and contain `fit()` and `predict()` functions.
21-
Categorical models must also implement `predict_proba()`.
2229

2330
## Installation
2431
```bash
2532
pip install causallib
2633
```
2734

2835
## Usage
29-
In general, the package is imported using the name `causallib`.
30-
For example, use
36+
In general, the package is imported using the name `causallib`.
37+
Every causal model requires an internal machine-learning model.
38+
`causallib` supports any model that has a sklearn-like fit-predict API
39+
(note some models might require a `predict_proba` implementation).
40+
41+
For example:
3142
```Python
3243
from sklearn.linear_model import LogisticRegression
3344
from causallib.estimation import IPW
45+
from causallib.datasets import load_nhefs
46+
47+
data = load_nhefs()
3448
ipw = IPW(LogisticRegression())
49+
ipw.fit(data.X, data.a)
50+
potential_outcomes = ipw.estimate_population_outcome(data.X, data.a, data.y)
51+
effect = ipw.estimate_effect(potential_outcomes[1], potential_outcomes[0])
3552
```
3653
Comprehensive Jupyter Notebooks examples can be found in the [examples directory](examples).
3754

55+
### Approach to causal-inference
56+
Some key points on how we address causal-inference estimation
57+
58+
##### 1. Emphasis on potential outcome prediction
59+
Causal effect may be the desired outcome.
60+
However, every effect is defined by two potential (counterfactual) outcomes.
61+
We adopt this two-step approach by separating the effect-estimating step
62+
from the potential-outcome-prediction step.
63+
A beneficial consequence to this approach is that it better supports
64+
multi-treatment problems where "effect" is not well-defined.
65+
66+
##### 2. Stratified average treatment effect
67+
The causal inference literature devotes special attention to the population
68+
on which the effect is estimated on.
69+
For example, ATE (average treatment effect on the entire sample),
70+
ATT (average treatment effect on the treated), etc.
71+
By allowing out-of-bag estimation, we leave this specification to the user.
72+
For example, ATE is achieved by `model.estimate_population_outcome(X, a)`
73+
and ATT is done by stratifying on the treated: `model.estimate_population_outcome(X.loc[a==1], a.loc[a==1])`
74+
75+
##### 3. Families of causal inference models
76+
We distinguish between two types of models:
77+
* *Weight models*: weight the data to balance between the treatment and control groups,
78+
and then estimates the potential outcome by using a weighted average of the observed outcome.
79+
Inverse Probability of Treatment Weighting (IPW or IPTW) is the most known example of such models.
80+
* *Direct outcome models*: uses the covariates (features) and treatment assignment to build a
81+
model that predicts the outcome directly. The model can then be used to predict the outcome
82+
under any assignment of treatment values, specifically the potential-outcome under assignment of
83+
all controls or all treated.
84+
These models are usually known as *Standardization* models, and it should be noted that, currently,
85+
they are the only ones able to generate *individual effect estimation* (otherwise known as CATE).
86+
87+
##### 4. Confounders and DAGs
88+
One of the most important steps in causal inference analysis is to have
89+
proper selection on both dimensions of the data to avoid introducing bias:
90+
* On rows: thoughtfully choosing the right inclusion\exclusion criteria
91+
for individuals in the data.
92+
* On columns: thoughtfully choosing what covariates (features) act as confounders
93+
and should be included in the analysis.
94+
95+
This is a place where domain expert knowledge is required and cannot be fully and truly automated
96+
by algorithms.
97+
This package assumes that the data provided to the model fit the criteria.
98+
However, filtering can be applied in real-time using a scikit-learn pipeline estimator
99+
that chains preprocessing steps (that can filter rows and select columns) with a causal model at the end.
100+

causallib/README.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,10 @@ These can be used within a pipeline framework together with the models.
4747
### `datasets`
4848
Several datasets are provided within the package in the `datasets` module:
4949
* NHEFS study data on the effect of smoking cessation on weight gain.
50-
* simulation module allows creating simulated data based on a causal graph
50+
Adapted from [Hernán and Robins' Causal Inference Book](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/)
51+
* A handful of simulation sets from the [2016 Atlantic Causal Inference
52+
Conference (ACIC) data challenge](https://jenniferhill7.wixsite.com/acic-2016/competition).
53+
* Simulation module allows creating simulated data based on a causal graph
5154
depicting the connection between covariates, treatment assignment and outcomes.
5255

5356
### Additional folders

causallib/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
__version__ = "0.6.0"

causallib/contrib/README.md

+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Module `causallib.contrib`
2+
This module currently includes additional causal methods contributed to the package
3+
by causal inference researchers other than `causallib`'s core developers.
4+
5+
The causal models in this module can be slightly more novel then in the ones in `estimation` module.
6+
However, they should largely adhere to `causallib` API
7+
(e.g., `IndividualOutcomeEstimator` or `WeightEstimator`).
8+
Since code here is more experimental,
9+
models might also require additional (and less trivial) package dependencies,
10+
or have less test coverage.
11+
Well-integrated models could be transferred into the main `estimation` module in the future.
12+
13+
## Contributed Methods
14+
Currently contributed methods are:
15+
16+
1. Adversarial Balancing: implementing the algorithm described in
17+
[Adversarial Balancing for Causal Inference](https://arxiv.org/abs/1810.07406).
18+
```python
19+
from causallib.contrib.adversarial_balancing import AdversarialBalancing
20+
21+
## Dependencies
22+
Each model might have slightly different requirements.
23+
Refer to the documentation of each model for the additional packages it requires.
24+
25+
Requirements for `contrib` models will be concentrated in `contrib/requirements.txt` and should be
26+
automatically installed using the extra-requirements `contrib` flag:
27+
```shell script
28+
pip install causallib[contrib]
29+
```

causallib/contrib/__init__.py

Whitespace-only changes.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from .adversarial_balancing import AdversarialBalancing

0 commit comments

Comments
 (0)