csdiddp/README.qmd at main · dadepro/csdiddp · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: Difference in Difference in Python
format: gfm
---

The **csdid** package contains tools for computing average treatment
effect parameters in a Difference-in-Differences setup allowing for

  - More than two time periods

  - Variation in treatment timing (i.e., units can become treated at
    different points in time)

  - Treatment effect heterogeneity (i.e, the effect of participating in
    the treatment can vary across units and exhibit potentially complex
    dynamics, selection into treatment, or time effects)

  - The parallel trends assumption holds only after conditioning on
    covariates

The main parameters are **group-time average treatment effects**. These
are the average treatment effect for a particular group (group is
defined by treatment timing) in a particular time period. These
parameters are a natural generalization of the average treatment effect
on the treated (ATT) which is identified in the textbook case with two
periods and two groups to the case with multiple periods.

Group-time average treatment effects are also natural building blocks
for more aggregated treatment effect parameters such as overall
treatment effects or event-study-type estimands.

## Getting Started

There has been some recent work on DiD with multiple time periods. The
**did** package implements the framework put forward in

- [Callaway, Brantly and Pedro H.C. Sant’Anna.
  "Difference-in-Differences with Multiple Time Periods." Journal of
  Econometrics, Vol. 225, No. 2,
  pp. 200-230, 2021.](https://doi.org/10.1016/j.jeconom.2020.12.001)
  or [arXiv](https://arxiv.org/abs/1803.09015

This project is based on the original [did R package](https://github.com/bcallaway11/did).

## Instalation

You can install **csdid** from `pypi` with:

```
pip install csdid
```

or via github:

```
pip install git+https://github.com/d2cml-ai/csdid/
```

### Dependencies

Additionally, I have created an additional library called `drdid`, which can be installed via GitHub.

```
pip install git+https://github.com/d2cml-ai/DRDID
```

## Basic Example


The following is a simplified example of the effect of states increasing
their minimum wages on county-level teen employment rates which comes
from [Callaway and Sant’Anna
(2021)](https://authors.elsevier.com/a/1cFzc15Dji4pnC).

  - [More detailed examples are also
    available](https://bcallaway11.github.io/did/articles/did-basics.html)

A subset of the data is available in the package and can be loaded by

```{python}
from csdid.att_gt import ATTgt
import pandas as pd
data = pd.read_csv("https://raw.githubusercontent.com/d2cml-ai/csdid/function-aggte/data/mpdta.csv")
```

The dataset contains 500 observations of county-level teen employment
rates from 2003-2007. Some states are first treated in 2004, some in
2006, and some in 2007 (see the paper for more details). The important
variables in the dataset are

  - **lemp** This is the log of county-level teen employment. It is the
    outcome variable

  - **first.treat** This is the period when a state first increases its
    minimum wage. It can be 2004, 2006, or 2007. It is the variable that
    defines *group* in this application

  - **year** This is the year and is the *time* variable

  - **countyreal** This is an id number for each county and provides the
    individual identifier in this panel data context

To estimate group-time average treatment effects, use the **ATTgt().fit()**
method

```{python}
out = ATTgt(yname = "lemp",
              gname = "first.treat",
              idname = "countyreal",
              tname = "year",
              xformla = f"lemp~1",
              data = data,
              ).fit(est_method = 'dr')
```


Summary table

```{python, eval = False}
out.summ_attgt().summary2
```

In the graphs, a semicolon `;` should be added to prevent printing the class and the graph information.

```{python}
out.plot_attgt();
```


```{python}
out.aggte(typec='calendar');
```


```{python}
out.plot_aggte();
```