forked from d2cml-ai/csdid
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.qmd
More file actions
137 lines (96 loc) · 3.9 KB
/
README.qmd
File metadata and controls
137 lines (96 loc) · 3.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: Difference in Difference in Python
format: gfm
---
The **csdid** package contains tools for computing average treatment
effect parameters in a Difference-in-Differences setup allowing for
- More than two time periods
- Variation in treatment timing (i.e., units can become treated at
different points in time)
- Treatment effect heterogeneity (i.e, the effect of participating in
the treatment can vary across units and exhibit potentially complex
dynamics, selection into treatment, or time effects)
- The parallel trends assumption holds only after conditioning on
covariates
The main parameters are **group-time average treatment effects**. These
are the average treatment effect for a particular group (group is
defined by treatment timing) in a particular time period. These
parameters are a natural generalization of the average treatment effect
on the treated (ATT) which is identified in the textbook case with two
periods and two groups to the case with multiple periods.
Group-time average treatment effects are also natural building blocks
for more aggregated treatment effect parameters such as overall
treatment effects or event-study-type estimands.
## Getting Started
There has been some recent work on DiD with multiple time periods. The
**did** package implements the framework put forward in
- [Callaway, Brantly and Pedro H.C. Sant’Anna.
"Difference-in-Differences with Multiple Time Periods." Journal of
Econometrics, Vol. 225, No. 2,
pp. 200-230, 2021.](https://doi.org/10.1016/j.jeconom.2020.12.001)
or [arXiv](https://arxiv.org/abs/1803.09015
This project is based on the original [did R package](https://github.com/bcallaway11/did).
## Instalation
You can install **csdid** from `pypi` with:
```
pip install csdid
```
or via github:
```
pip install git+https://github.com/d2cml-ai/csdid/
```
### Dependencies
Additionally, I have created an additional library called `drdid`, which can be installed via GitHub.
```
pip install git+https://github.com/d2cml-ai/DRDID
```
## Basic Example
The following is a simplified example of the effect of states increasing
their minimum wages on county-level teen employment rates which comes
from [Callaway and Sant’Anna
(2021)](https://authors.elsevier.com/a/1cFzc15Dji4pnC).
- [More detailed examples are also
available](https://bcallaway11.github.io/did/articles/did-basics.html)
A subset of the data is available in the package and can be loaded by
```{python}
from csdid.att_gt import ATTgt
import pandas as pd
data = pd.read_csv("https://raw.githubusercontent.com/d2cml-ai/csdid/function-aggte/data/mpdta.csv")
```
The dataset contains 500 observations of county-level teen employment
rates from 2003-2007. Some states are first treated in 2004, some in
2006, and some in 2007 (see the paper for more details). The important
variables in the dataset are
- **lemp** This is the log of county-level teen employment. It is the
outcome variable
- **first.treat** This is the period when a state first increases its
minimum wage. It can be 2004, 2006, or 2007. It is the variable that
defines *group* in this application
- **year** This is the year and is the *time* variable
- **countyreal** This is an id number for each county and provides the
individual identifier in this panel data context
To estimate group-time average treatment effects, use the **ATTgt().fit()**
method
```{python}
out = ATTgt(yname = "lemp",
gname = "first.treat",
idname = "countyreal",
tname = "year",
xformla = f"lemp~1",
data = data,
).fit(est_method = 'dr')
```
Summary table
```{python, eval = False}
out.summ_attgt().summary2
```
In the graphs, a semicolon `;` should be added to prevent printing the class and the graph information.
```{python}
out.plot_attgt();
```
```{python}
out.aggte(typec='calendar');
```
```{python}
out.plot_aggte();
```