You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* follow up on open issue on the same topic
* removed two files from this PR
Signed-off-by: Amit Sharma <[email protected]>
* added docstring to datasets
* updated docstring to avoid format error
Signed-off-by: Amit Sharma <[email protected]>
* updated black error
Signed-off-by: Amit Sharma <[email protected]>
Signed-off-by: Amit Sharma <[email protected]>
Co-authored-by: Amit Sharma <[email protected]>
Copy file name to clipboardexpand all lines: dowhy/datasets.py
+151
Original file line number
Diff line number
Diff line change
@@ -92,6 +92,157 @@ def linear_dataset(
92
92
stddev_outcome_noise=0.01,
93
93
one_hot_encode=False,
94
94
):
95
+
"""
96
+
Generate a synthetic dataset with a known effect size.
97
+
98
+
This function generates a pandas dataFrame with num_samples records. The variables follow a naming convention where the first letter indicates its role in the causality graph and then a sequence number.
99
+
100
+
:param beta: coefficient of the treatment(s) ('v?') in the generating equation of the outcome ('y').
101
+
:type beta: int or list/ndarray of length num_treatments of type int
102
+
:param num_common_causes: Number of variables affecting both the treatment and the outcome [w -> v; w -> y]
103
+
:type num_common_causes: int
104
+
:param num_samples: Number of records to generate
105
+
:type num_samples: int
106
+
:param num_instruments: Number of instrumental variables [z -> v], defaults to 0
107
+
:type num_instruments: int
108
+
:param num_effect_modifiers: Number of effect modifiers, variables affecting only the outcome [x -> y], defaults to 0
109
+
:type num_effect_modifiers: int
110
+
:param num_treatments: Number of treatment variables [v], defaults to 1
111
+
:type num_treatments : int
112
+
:param num_frontdoor_variables : Number of frontdoor mediating variables [v -> FD -> y], defaults to 0
113
+
:type num_frontdoor_variables: int
114
+
:param treatment_is_binary: Cannot be True if treatment_is_category is True, defaults to True
115
+
:type treatment_is_binary: bool
116
+
:param treatment_is_category: Cannot be True if treatment_is_binary is True, defaults to False
117
+
:type treatment_is_category: bool
118
+
:param outcome_is_binary: defaults to False,
119
+
:type outcome_is_binary: bool
120
+
:param stochastic_discretization: if False, quartiles are used when discretised variables are specified. They can be hot encoded, defaults True
121
+
:type stochastic_discretization: bool
122
+
:param num_discrete_common_causes: Number of discrete common causes of the total num_common_causes, defaults to 0
123
+
:type num_discrete_common_causes: int
124
+
:param num_discrete_instruments: Number of discrete instrumental variables of the total num_instruments, defaults to 0
125
+
:type num_discrete_instruments : int
126
+
:param num_discrete_effect_modifiers : Number of discrete effect modifiers of the total effect_modifiers, defaults to 0
127
+
:type num_discrete_effect_modifiers: int
128
+
:param stddev_treatment_noise : defaults to 1
129
+
:type stddev_treatment_noise : float
130
+
:param stddev_outcome_noise: defaults to 0.01
131
+
:type stddev_outcome_noise: float
132
+
:param one_hot_encode: defaults to False
133
+
:type one_hot_encode: bool
134
+
135
+
:returns: Dictionary with pandas dataFrame and few other metadata variables.
136
+
"df": pd.dataFrame
137
+
with num_samples records. The variables follow a naming convention were the first letter indicates its role in the causality graph and then a sequence number.
138
+
139
+
v variables - are the treatments. They can be binary or continuous. In the case of continuous abs(*beta*) defines thier magnitude;
140
+
141
+
y - is the outcome variable. The generating equation is,
142
+
y = normal(0, stddev_outcome_noise) + t @ beta [where @ is a numpy matrix multiplication allowing for beta be a vector]
143
+
144
+
W variables - commonly cause both the treatment and the outcome and are iid. if continuous, they are Norm(mu = Unif(-1,1), sigma = 1)
145
+
146
+
Z variables - Instrument variables. Each one affects all treatments. i.e. if there is one instrument and two treatments then z0->v0, z0->v1
147
+
148
+
X variables - effect modifiers. If continuous, they are Norm(mu = Unif(-1,1), sigma = 1)
0 commit comments