Updated documentation

dscolby · dscolby · commit 5aa51faa6054 · 2024-12-15T23:19:57.000-06:00
diff --git a/README.md b/README.md
@@ -50,34 +50,38 @@ code.
 
 <h2>Extreme Learning Machines and Causal Inference</h2>
 <p>
-In some cases we would like to know the causal effect of some intervention but we do not 
-have the counterfactual, making conventional methods of statistical analysis infeasible. 
-However, it may still be possible to get an unbiased estimate of the causal effect (ATE, 
-ATE, or ITT) by predicting the counterfactual and comparing it to the observed outcomes if 
-we are able to observe all potential confounders. This is the approach CausalELM takes to 
-conduct interrupted time series analysis, G-Computation, double machine learning, and 
-metalearning via S-Learners, T-Learners, X-Learners, R-learners, and doubly robust 
-estimation. In interrupted time series analysis, we want to estimate the effect of some 
-intervention on the outcome of a single unit that we observe during multiple time periods. 
-For example, we might want to know how the announcement of a merger affected the price of 
-Stock A. To do this, we need to know what the price of stock A would have been if the merger 
-had not been announced, which we can predict with machine learning methods. Then, we can 
-compare this predicted counterfactual to the observed price data to estimate the effect of 
-the merger announcement. In another case, we might want to know the effect of medicine X on 
-disease Y but the administration of X was not random and it might have also been 
-administered at mulitiple time periods, which would produce biased estimates. To overcome 
-this, G-computation models the observed data, uses the model to predict the outcomes if all 
-patients recieved the treatment, and compares it to the predictions of the outcomes if none 
-of the patients recieved the treatment. Double machine learning (DML) takes a similar 
-approach but also models the treatment mechanism and uses it to adjust the initial estimates. 
-This approach has three advantages. First, it is more efficient with high dimensional data 
-than conventional methods. Metalearners take a similar approach to estimate the CATE. While 
-all of these models are different, they have one thing in common: how well they perform 
-depends on the underlying model they fit to the data. To that end, CausalELMs use bagged 
-ensembles of extreme learning machines because they are simple yet flexible enough to be 
-universal function approximators with lower varaince than single extreme learning machines.
+CausalELM provides easy-to-use implementations of modern causal inference methods. While
+CausalELM implements a variety of estimators, they all have one thing in common—the use of 
+machine learning models to flexibly estimate causal effects. This is where the ELM in 
+CausalELM comes from—the machine learning model underlying all the estimators is an extreme 
+learning machine (ELM). ELMs are a simple neural network that use randomized weights and 
+offer a good tradeoff between learning non-linear dependencies and simplicity. Furthermore, 
+CausalELM implements bagged ensembles of ELMs to reduce the variance resulting from 
+randomized weights.
 </p>
 
+<h2>Estimators</h2>
+<p>
+CausalELM implements estimators for aggreate e.g. average treatment effect (ATE) and 
+individualized e.g. conditional average treatment effect (CATE) quantities of interest.
+</p>
+
+<h3>Estimators for Aggregate Effects</h3>
+<ul>
+    <li>Interrupted Time Series Estimator</li>
+    <li>G-computation</li>
+    <li>Double machine Learning</li>
+</ul>
+
+<h3>Individualized Treatment Effect (CATE) Estimators</h3>
+<ul>
+    <li>S-learner</li>
+    <li>T-learner</li>
+    <li>X-learner</li>
+    <li>R-learner</li>
+    <li>Doubly Robust Estimator</li>
+</ul>
+
 <h2>CausalELM Features</h2>
 <ul>
   <li>Estimate a causal effect, get a summary, and validate assumptions in just four lines of code</li>
@@ -87,25 +91,28 @@ universal function approximators with lower varaince than single extreme learnin
   <li>Most inference and validation tests do not assume functional or distributional forms</li>
   <li>Implements the latest techniques form statistics, econometrics, and biostatistics</li>
   <li>Works out of the box with arrays or any data structure that implements the Tables.jl interface</li>
+  <li>Works out of the box with AbstractArrays or any data structure that implements the Tables.jl interface</li>
+  <li>Works with CuArrays, ROCArrays, and any other GPU-specific arrays that are AbstractArrays</li>
+  <li>CausalELM is lightweight—its only dependency is Tables.jl</li>
   <li>Codebase is high-quality, well tested, and regularly updated</li>
 </ul>
 
 <h2>What's New?</h2>
 <ul>
   <li>See the JuliaCon 2024 CausalELM demonstration <a href="https://www.youtube.com/watch?v=hh_cyj8feu8&t=26s">here.
-  <li>Model summaries include confidence intervals and marginal effects<li>
-  <li>Now includes doubly robust estimator for CATE estimation</li>
-  <li>All estimators now implement bagging to reduce predictive performance and reduce variance</li>
-  <li>Counterfactual consistency validation simulates more realistic violations of the counterfactual consistency assumption</li>
+  <li>Includes support for GPU-specific arrays and data structures that implement the Tables.jl API<li>
+  <li>Only performs randomization inference when the inference argument is set to true in summarize methods</li>
+  <li>Summaries support calculating marginal effects and confidence intervals</li>
+  <li>Randomization inference now uses multithreading</li>
+  <li>Refactored code to be easier to extend and understand</li>
   <li>Uses a simple heuristic to choose the number of neurons, which reduces training time and still works well in practice</li>
   <li>Probability clipping for classifier predictions and residuals is no longer necessary due to the bagging procedure</li>
 </ul>
 
 <h2>What's Next?</h2>
 <p>
-Newer versions of CausalELM will hopefully support using GPUs and provide interpretations of 
-the results of calling validate on a model that has been estimated. We may also add some other
-features depending on demand and feedback.
+Efforts for the next version of CausalELM will focus on providing interpreteations for the results of callin validate as well
+as fixing any bugs and eliciting feedback.
 </p>
 
 <h2>Disclaimer</h2>
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -13,41 +13,51 @@ CurrentModule = CausalELM
 
 # Overview
 
-CausalELM leverages new techniques in machine learning and statistics to estimate individual 
-and aggregate treatment effects in situations where traditional methods are unsatisfactory 
-or infeasible. To enable this, CausalELM provides a simple API to initialize a model, 
-estimate a causal effect, get a summary of the model, and test its robustness. CausalELM 
-includes estimators for interupted time series analysis, G-Computation, double machine 
-learning, S-Learning, T-Learning, X-Learning, R-learning, and doubly robust estimation. 
-Underlying all these estimators are bagged extreme learning machines. Extreme learning 
-machines are a single layer feedfoward neural network that relies on randomized weights and 
-least squares optimization, making them expressive, simple, and computationally 
-efficient. Combining them with bagging reduces the variance caused by the randomization of 
-weights and provides a form of regularization that does not have to be tuned through cross 
-validation. These attributes make CausalELM a very simple and powerful package for 
-estimating treatment effects.
+CausalELM provides easy-to-use implementations of modern causal inference methods. While
+CausalELM implements a variety of estimators, they all have one thing in common—the use of 
+machine learning models to flexibly estimate causal effects. This is where the ELM in 
+CausalELM comes from—the machine learning model underlying all the estimators is an extreme 
+learning machine (ELM). ELMs are a simple neural network that use randomized weights and 
+offer a good tradeoff between learning non-linear dependencies and simplicity. Furthermore, 
+CausalELM implements bagged ensembles of ELMs to reduce the variance resulting from 
+randomized weights.
 
-For a more interactive overview, see our JuliaCon 2024 talk[here](https://www.youtube.com/watch?v=hh_cyj8feu8&t=26s)
+## Estimators
+CausalELM implements estimators for aggreate e.g. average treatment effect (ATE) and 
+individualized e.g. conditional average treatment effect (CATE) quantities of interest.
+
+### Estimators for Aggregate Effects
+*   Interrupted Time Series Estimator
+*   G-computation
+*   Double machine Learning
 
-### Features
+### Individualized Treatment Effect (CATE) Estimators
+*   S-learner
+*   T-learner
+*   X-learner
+*   R-learner
+*   Doubly Robust Estimator
+
+## Features
 *   Estimate a causal effect, get a summary, and validate assumptions in just four lines of code
-*   Bagging improves performance and reduces variance without the need to tune a regularization parameter
 *   Enables using the same structs for regression and classification
 *   Includes 13 activation functions and allows user-defined activation functions
 *   Most inference and validation tests do not assume functional or distributional forms
 *   Implements the latest techniques from statistics, econometrics, and biostatistics
-*   Works out of the box with arrays or any data structure that implements the Tables.jl interface
+*   Works out of the box with AbstractArrays or any data structure that implements the Tables.jl interface
+*   Works with CuArrays, ROCArrays, and any other GPU-specific arrays that are AbstractArrays
+*   CausalELM is lightweight—its only dependency is Tables.jl
 *   Codebase is high-quality, well tested, and regularly updated
 
-### What's New?
-*   Model summaries contain confidence intervals and marginal effects
-*   Now includes doubly robust estimator for CATE estimation
-*   All estimators now implement bagging to reduce predictive performance and reduce variance
-*   Counterfactual consistency validation simulates more realistic violations of the counterfactual consistency assumption
-*   Uses a simple heuristic to choose the number of neurons, which reduces training time and still works well in practice
-*   Probability clipping for classifier predictions and residuals is no longer necessary due to the bagging procedure
+## What's New?
+*   Includes support for GPU-specific arrays and data structures that implement the Tables.jl API
+*   Only performs randomization inference when the inference argument is set to true in summarize methods
+*   Summaries support calculating marginal effects and confidence intervals
+*   Randomization inference now uses multithreading
+*   CausalELM was presented at JuliaCon 2024 in Eindhoven
+*   Refactored code to be easier to extend and understand
 
-### What makes CausalELM different?
+## What makes CausalELM different?
 Other packages, mainly EconML, DoWhy, CausalAI, and CausalML, have similar funcitonality. 
 Beides being written in Julia rather than Python, the main differences between CausalELM and 
 these libraries are:
@@ -69,15 +79,24 @@ these libraries are:
     estimators provide p-values and standard errors generated via approximate randomization 
     inference. 
 *   CausalELM strives to be lightweight while still being powerful and therefore does not 
-    have external dependencies: all the functions it uses are in the Julia standard library.
+    have external dependencies: all the functions it uses are in the Julia standard library
+    with the exception of model constructors, which use Tables.matrix to ensure integration 
+    with a wide variety of data structures.
 *   The other packages and many others mostly use techniques from one field. Instead, 
     CausalELM incorporates a hodgepodge of ideas from statistics, machine learning, 
     econometrics, and biostatistics.
+*   CausalELM doesn't use any unnecessary abstractions. The only structs are the actual 
+    models. Estimated effects are returned as arrays, summaries are returned in a dictionary, 
+    and the results of validating an estimator are returned as tuples. This is in contrast 
+    to other packages that utilize separate structs (classes) for summaries and inference 
+    results.
 
-### Installation
+## Installation
 CausalELM requires Julia version 1.8 or greater and can be installed from the REPL as shown 
 below. 
 ```julia
 using Pkg 
 Pkg.add("CausalELM")
 ```
+## More Information
+For a more interactive overview, see our JuliaCon 2024 talk[here](https://www.youtube.com/watch?v=hh_cyj8feu8&t=26s)