@@ -13,17 +13,22 @@ Causal inference analysis enables estimating the causal effect of
13
13
an intervention on some outcome from real-world non-experimental observational data.
14
14
15
15
This package provides a suite of causal methods,
16
- under a unified scikit-learn-inspired API.
16
+ under a unified scikit-learn-inspired API.
17
17
It implements meta-algorithms that allow plugging in arbitrarily complex machine learning models.
18
- This modular approach supports highly-flexible causal modelling.
19
- The fit-and-predict-like API makes it possible to train on one set of examples
18
+ This modular approach supports highly-flexible causal modelling.
19
+ The fit-and-predict-like
20
+ API makes it possible to train on one set of examples
20
21
and estimate an effect on the other (out-of-bag),
21
22
which allows for a more "honest"<sup >1</sup > effect estimation.
22
23
23
24
The package also includes an evaluation suite.
24
25
Since most causal-models utilize machine learning models internally,
25
26
we can diagnose poor-performing models by re-interpreting known ML evaluations from a causal perspective.
26
- If you use it in scientific context, please consider citing [ Shimoni et al., 2019] ( https://arxiv.org/abs/1906.00442 ) :
27
+
28
+ If you use the package, please consider citing [ Shimoni et al., 2019] ( https://arxiv.org/abs/1906.00442 ) :
29
+ <details >
30
+ <summary >Reference</summary >
31
+
27
32
``` bibtex
28
33
@article{causalevaluations,
29
34
title={An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference},
@@ -34,20 +39,28 @@ If you use it in scientific context, please consider citing [Shimoni et al., 201
34
39
```
35
40
36
41
-------------
42
+ </details >
43
+
37
44
<sup >1</sup > Borrowing [ Wager & Athey] ( https://arxiv.org/abs/1510.04342 ) terminology of avoiding overfit.
38
45
39
46
40
47
## Installation
41
48
``` bash
42
- pip install causallib
49
+ pip install git+ssh://
[email protected] /CausalDev/CausalInference.git
50
+ ```
51
+ To install a specific branch use:
52
+ ``` bash
53
+ pip install git+ssh://
[email protected] /CausalDev/CausalInference.git@{branch-name}
# egg=causallib
43
54
```
44
55
56
+ If installing for development purposes then installation should be performed
57
+ with the ` -e ` flag.
58
+
45
59
## Usage
46
- In general, the package is imported using the name ` causallib ` .
47
- Every causal model requires an internal machine-learning model.
60
+ The package is imported using the name ` causallib ` .
61
+ Each causal model requires an internal machine-learning model.
48
62
` causallib ` supports any model that has a sklearn-like fit-predict API
49
- (note some models might require a ` predict_proba ` implementation).
50
-
63
+ (note some models might require a ` predict_proba ` implementation).
51
64
For example:
52
65
``` Python
53
66
from sklearn.linear_model import LogisticRegression
@@ -63,7 +76,7 @@ effect = ipw.estimate_effect(potential_outcomes[1], potential_outcomes[0])
63
76
Comprehensive Jupyter Notebooks examples can be found in the [ examples directory] ( examples ) .
64
77
65
78
### Community support
66
- We use the Slack workspace at [ causallib.slack.com] ( https://causallib.slack.com/ ) for informal communication.
79
+ We use the Slack workspace at [ causallib.slack.com] ( https://causallib.slack.com/ ) for informal communication.
67
80
We encourage you to ask questions regarding causal-inference modelling or
68
81
usage of causallib that don't necessarily merit opening an issue on Github.
69
82
@@ -74,25 +87,25 @@ Some key points on how we address causal-inference estimation
74
87
75
88
##### 1. Emphasis on potential outcome prediction
76
89
Causal effect may be the desired outcome.
77
- However, every effect is defined by two potential (counterfactual) outcomes.
90
+ However, every effect is defined by two potential (counterfactual) outcomes.
78
91
We adopt this two-step approach by separating the effect-estimating step
79
- from the potential-outcome-prediction step.
92
+ from the potential-outcome-prediction step.
80
93
A beneficial consequence to this approach is that it better supports
81
94
multi-treatment problems where "effect" is not well-defined.
82
95
83
96
##### 2. Stratified average treatment effect
84
97
The causal inference literature devotes special attention to the population
85
98
on which the effect is estimated on.
86
99
For example, ATE (average treatment effect on the entire sample),
87
- ATT (average treatment effect on the treated), etc.
100
+ ATT (average treatment effect on the treated), etc.
88
101
By allowing out-of-bag estimation, we leave this specification to the user.
89
102
For example, ATE is achieved by ` model.estimate_population_outcome(X, a) `
90
103
and ATT is done by stratifying on the treated: ` model.estimate_population_outcome(X.loc[a==1], a.loc[a==1]) `
91
104
92
105
##### 3. Families of causal inference models
93
106
We distinguish between two types of models:
94
107
* * Weight models* : weight the data to balance between the treatment and control groups,
95
- and then estimates the potential outcome by using a weighted average of the observed outcome.
108
+ and then estimates the potential outcome by using a weighted average of the observed outcome.
96
109
Inverse Probability of Treatment Weighting (IPW or IPTW) is the most known example of such models.
97
110
* * Direct outcome models* : uses the covariates (features) and treatment assignment to build a
98
111
model that predicts the outcome directly. The model can then be used to predict the outcome
@@ -111,7 +124,7 @@ proper selection on both dimensions of the data to avoid introducing bias:
111
124
112
125
This is a place where domain expert knowledge is required and cannot be fully and truly automated
113
126
by algorithms.
114
- This package assumes that the data provided to the model fit the criteria.
127
+ This package assumes that the data provided to the model fit the criteria.
115
128
However, filtering can be applied in real-time using a scikit-learn pipeline estimator
116
129
that chains preprocessing steps (that can filter rows and select columns) with a causal model at the end.
117
130
0 commit comments