Merge pull request #20 from IBM/version-0.7.1

ehudkr · yishaishimoni · web-flow · commit b29b4aa7fc05 · 2021-10-05T19:02:03.000+03:00
version 0.7.1

Co-authored-by: Yishai Shimoni &lt;yishais@il.ibm.com&gt;
diff --git a/README.md b/README.md
@@ -13,17 +13,22 @@ Causal inference analysis enables estimating the causal effect of
 an intervention on some outcome from real-world non-experimental observational data.  
 
 This package provides a suite of causal methods, 
-under a unified scikit-learn-inspired API.  
+under a unified scikit-learn-inspired API.
 It implements meta-algorithms that allow plugging in arbitrarily complex machine learning models. 
-This modular approach supports highly-flexible causal modelling.    
-The fit-and-predict-like API makes it possible to train on one set of examples 
+This modular approach supports highly-flexible causal modelling.
+The fit-and-predict-like 
+API makes it possible to train on one set of examples 
 and estimate an effect on the other (out-of-bag),
 which allows for a more "honest"<sup>1</sup> effect estimation.
 
 The package also includes an evaluation suite. 
 Since most causal-models utilize machine learning models internally, 
 we can diagnose poor-performing models by re-interpreting known ML evaluations from  a causal perspective.
-If you use it in scientific context, please consider citing [Shimoni et al., 2019](https://arxiv.org/abs/1906.00442):
+
+If you use the package, please consider citing [Shimoni et al., 2019](https://arxiv.org/abs/1906.00442):
+<details>
+  <summary>Reference</summary>
+  
 ```bibtex
 @article{causalevaluations,
   title={An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference},
@@ -34,20 +39,28 @@ If you use it in scientific context, please consider citing [Shimoni et al., 201
 ```
 
 -------------
+</details>
+
 <sup>1</sup> Borrowing [Wager & Athey](https://arxiv.org/abs/1510.04342) terminology of avoiding overfit.  
 
 
 ## Installation
 ```bash
-pip install causallib
+pip install git+ssh://git@github.ibm.com/CausalDev/CausalInference.git
+```
+To install a specific branch use:
+```bash
+pip install git+ssh://git@github.ibm.com/CausalDev/CausalInference.git@{branch-name}#egg=causallib
 ```
 
+If installing for development purposes then installation should be performed
+with the `-e` flag. 
+
 ## Usage
-In general, the package is imported using the name `causallib`.  
-Every causal model requires an internal machine-learning model. 
+The package is imported using the name `causallib`.
+Each causal model requires an internal machine-learning model.
 `causallib` supports any model that has a sklearn-like fit-predict API
-(note some models might require a `predict_proba` implementation).  
-
+(note some models might require a `predict_proba` implementation).
 For example:
 ```Python
 from sklearn.linear_model import LogisticRegression
@@ -63,7 +76,7 @@ effect = ipw.estimate_effect(potential_outcomes[1], potential_outcomes[0])
 Comprehensive Jupyter Notebooks examples can be found in the [examples directory](examples).
 
 ### Community support
-We use the Slack workspace at [causallib.slack.com](https://causallib.slack.com/) for informal communication.  
+We use the Slack workspace at [causallib.slack.com](https://causallib.slack.com/) for informal communication.
 We encourage you to ask questions regarding causal-inference modelling or 
 usage of causallib that don't necessarily merit opening an issue on Github.  
 
@@ -74,25 +87,25 @@ Some key points on how we address causal-inference estimation
 
 ##### 1. Emphasis on potential outcome prediction  
 Causal effect may be the desired outcome. 
-However, every effect is defined by two potential (counterfactual) outcomes.  
+However, every effect is defined by two potential (counterfactual) outcomes. 
 We adopt this two-step approach by separating the effect-estimating step 
-from the potential-outcome-prediction step.  
+from the potential-outcome-prediction step. 
 A beneficial consequence to this approach is that it better supports 
 multi-treatment problems where "effect" is not well-defined.
 
 ##### 2. Stratified average treatment effect
 The causal inference literature devotes special attention to the population 
 on which the effect is estimated on.
 For example, ATE (average treatment effect on the entire sample),
-ATT (average treatment effect on the treated), etc.  
+ATT (average treatment effect on the treated), etc. 
 By allowing out-of-bag estimation, we leave this specification to the user.
 For example, ATE is achieved by `model.estimate_population_outcome(X, a)`
 and ATT is done by stratifying on the treated: `model.estimate_population_outcome(X.loc[a==1], a.loc[a==1])`
 
 ##### 3. Families of causal inference models
 We distinguish between two types of models:
 * *Weight models*: weight the data to balance between the treatment and control groups, 
-   and then estimates the potential outcome by using a weighted average of the observed outcome.  
+   and then estimates the potential outcome by using a weighted average of the observed outcome. 
    Inverse Probability of Treatment Weighting (IPW or IPTW) is the most known example of such models. 
 * *Direct outcome models*: uses the covariates (features) and treatment assignment to build a
    model that predicts the outcome directly. The model can then be used to predict the outcome
@@ -111,7 +124,7 @@ proper selection on both dimensions of the data to avoid introducing bias:
 
 This is a place where domain expert knowledge is required and cannot be fully and truly automated
 by algorithms. 
-This package assumes that the data provided to the model fit the criteria.   
+This package assumes that the data provided to the model fit the criteria. 
 However, filtering can be applied in real-time using a scikit-learn pipeline estimator
 that chains preprocessing steps (that can filter rows and select columns) with a causal model at the end.
 
diff --git a/causallib/__init__.py b/causallib/__init__.py
@@ -1 +1 @@
-__version__ = "0.7.0"
+__version__ = "0.7.1"
diff --git a/causallib/evaluation/evaluator.py b/causallib/evaluation/evaluator.py
@@ -122,7 +122,7 @@ def score_binary_prediction(self, y_true, y_pred_proba=None, y_pred=None, sample
                 warnings.warn(str(v))
                 scores[metric_name] = np.nan
 
-        dtype = np.float if all([np.isscalar(score) for score in scores.values()]) else np.dtype(object)
+        dtype = float if all([np.isscalar(score) for score in scores.values()]) else np.dtype(object)
         return pd.Series(scores, dtype=dtype)
 
     def score_regression_prediction(self, y_true, y_pred, sample_weight=None, metrics_to_evaluate=None):
diff --git a/causallib/evaluation/plots.py b/causallib/evaluation/plots.py
@@ -478,25 +478,26 @@ def plot_propensity_score_distribution(propensity, treatment, reflect=True, kde=
     ax = ax or plt.gca()
     if kde and not norm_hist:
         warnings.warn("kde=True and norm_hist=False is not supported. Forcing norm_hist from False to True.")
-        norm_hist=True
+        norm_hist = True
     bins = np.histogram(propensity, bins="auto")[1]
-    plot_params = dict(bins=bins,  density=norm_hist, alpha=0.5, cumulative=cumulative)
+    plot_params = dict(bins=bins, density=norm_hist, alpha=0.5, cumulative=cumulative)
 
     unique_treatments = np.sort(np.unique(treatment))
-    for treatment_value in unique_treatments:
+    for treatment_number, treatment_value in enumerate(unique_treatments):
         cur_propensity = propensity.loc[treatment == treatment_value]
-        cur_color = "C{}".format(treatment_value)
-        ax.hist(cur_propensity, label = "treatment = {}".format(treatment_value), color=cur_color,**plot_params)
+        cur_color = "C{}".format(treatment_number)
+        ax.hist(cur_propensity, label="treatment = {}".format(treatment_value),
+                color=[cur_color], **plot_params)
         if kde:
             cur_kde = gaussian_kde(cur_propensity)
-            min_support = max(0,cur_propensity.values.min() - cur_kde.factor)
-            max_support = min(1, cur_propensity.values.max() +  cur_kde.factor)
-            X_plot = np.linspace(min_support,max_support,200)
+            min_support = max(0, cur_propensity.values.min() - cur_kde.factor)
+            max_support = min(1, cur_propensity.values.max() + cur_kde.factor)
+            X_plot = np.linspace(min_support, max_support, 200)
             if cumulative:
                 density = np.array([cur_kde.integrate_box_1d(X_plot[0], x_i) for x_i in X_plot])
-                ax.plot(X_plot,density,color=cur_color,)
-            else:    
-                ax.plot(X_plot,cur_kde.pdf(X_plot),color=cur_color,)
+                ax.plot(X_plot, density, color=cur_color, )
+            else:
+                ax.plot(X_plot, cur_kde.pdf(X_plot), color=cur_color, )
     if reflect:
         if len(unique_treatments) != 2:
             raise ValueError("Reflecting density across X axis can only be done for two groups. "
diff --git a/causallib/tests/test_plots.py b/causallib/tests/test_plots.py
@@ -0,0 +1,77 @@
+# (C) Copyright 2020 IBM Corp.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Created on Nov 12, 2020
+
+from sklearn.linear_model import LogisticRegression, LinearRegression
+from causallib.evaluation import PropensityEvaluator, OutcomeEvaluator
+from causallib.estimation import DoublyRobustVanilla, IPW, StratifiedStandardization
+from causallib.datasets import load_nhefs
+import unittest
+import pandas as pd
+from sklearn.utils import Bunch
+import matplotlib
+matplotlib.use('Agg')
+
+
+class TestPlots(unittest.TestCase):
+    @classmethod
+    def setUpClass(self):
+        self.data = load_nhefs()
+        ipw = IPW(LogisticRegression(solver="liblinear"), truncate_eps=0.05)
+        std = StratifiedStandardization(LinearRegression())
+        self.dr = DoublyRobustVanilla(std, ipw)
+        self.dr.fit(self.data.X, self.data.a, self.data.y)
+        self.prp_evaluator = PropensityEvaluator(self.dr.weight_model)
+        self.out_evaluator = OutcomeEvaluator(self.dr.outcome_model)
+
+    def propensity_plot_by_name(self, test_names, alternate_a=None):
+        a = self.data.a if alternate_a is None else alternate_a
+        nhefs_plots = self.prp_evaluator.evaluate_simple(
+            self.data.X, a, self.data.y, plots=test_names)
+        [self.assertIsNotNone(x) for x in nhefs_plots.plots.values()]
+        return True
+
+    def outcome_plot_by_name(self, test_names):
+        nhefs_plots = self.out_evaluator.evaluate_simple(
+            self.data.X, self.data.a, self.data.y, plots=test_names)
+        [self.assertIsNotNone(x) for x in nhefs_plots.plots.values()]
+        return True
+
+    def propensity_plot_multiple_a(self, test_names):
+        self.assertTrue(self.propensity_plot_by_name(test_names, alternate_a=self.data.a.astype(int)))
+        self.assertTrue(self.propensity_plot_by_name(test_names, alternate_a=self.data.a.astype(float)))
+        # self.assertTrue(self.propensity_plot_by_name(test_names, alternate_a=self.data.a.astype(str).factorize()))
+
+
+    def test_weight_distribution_plot(self):
+        self.propensity_plot_multiple_a(["weight_distribution"])
+
+    def test_propensity_roc_plots(self):
+        self.propensity_plot_multiple_a(['roc_curve'])
+
+    def test_precision_plots(self):
+        self.propensity_plot_multiple_a(['precision'])
+
+    def test_precision_plots(self):
+        self.propensity_plot_multiple_a(['covariate_balance_love'])
+
+    def test_propensity_multiple_plots(self):
+        self.propensity_plot_multiple_a(['roc_curve', 'covariate_balance_love'])
+
+    def test_accuracy_plot(self):
+        self.assertTrue(self.outcome_plot_by_name(
+            ["common_support", "continuous_accuracy"]))
+
+# todo: add more tests (including ones that raise exceptions). No point in doing this right now since a major refactoring for the plots is ongoing

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-__version__ = "0.7.0"`
	`1`	`+__version__ = "0.7.1"`