Accept to use variable and categorical variable from dataframe index #211
Open
Description
Very often in panel regression, the fixed effect is implemented as categorical variable. Currently, unless using some hacky way, patsy
cannot read put index as variables. See below example panel dataset,
import statsmodels.api as sm
df_raw = sm.datasets.get_rdataset('pwt_sample', 'stevedata').data.set_index(['isocode', 'year']).drop(['country'], axis=1)
df = df_raw.dropna()
print(df)
And the panel dataframe looks like:
pop hc rgdpna rgdpo rgdpe labsh avh emp rnna
isocode year
AUS 1950 8.354106 2.667302 1.274612e+05 1.141350e+05 1.219940e+05 0.680492 2170.923406 3.429873 6.399912e+05
1951 8.599923 2.674344 1.307031e+05 1.105431e+05 1.139294e+05 0.680492 2150.846928 3.523916 6.901136e+05
1952 8.782430 2.681403 1.253531e+05 1.088834e+05 1.112199e+05 0.680492 2130.956115 3.591675 7.045624e+05
1953 8.950892 2.688482 1.389522e+05 1.226885e+05 1.233289e+05 0.680492 2111.249251 3.653409 7.331073e+05
1954 9.159148 2.695580 1.500607e+05 1.318364e+05 1.314721e+05 0.680492 2091.724634 3.731083 7.714542e+05
... ... ... ... ... ... ... ... ... ...
USA 2015 320.878310 3.728116 1.877616e+07 1.878487e+07 1.890040e+07 0.595646 1770.023174 150.248474 6.505781e+07
2016 323.015995 3.733411 1.909750e+07 1.909468e+07 1.928048e+07 0.593773 1766.744125 152.396957 6.597406e+07
2017 325.084756 3.738714 1.954298e+07 1.954298e+07 1.975004e+07 0.596151 1763.726676 154.672318 6.694270e+07
2018 327.096265 3.744024 2.012858e+07 2.015604e+07 2.036575e+07 0.594326 1774.703811 156.675903 6.800735e+07
2019 329.064917 3.749341 2.056359e+07 2.059635e+07 2.085650e+07 0.597091 1765.346390 158.299591 6.905906e+07
Very often we need patsy to do a regression with from_formula
which indeed uses patsy.dmatrices
:
sm.OLS.from_formula('pop ~ rgdpna + year + C(isocode)', df_raw).fit().summary()
This prompts errors:
PatsyError: Error evaluating factor: NameError: name 'isocode' is not defined
pop ~ rgdpna + year + C(isocode)
^^^^^^^^^^
Very often it has the panel dimension is in the index level and users would like to use them in fixed effect and endog. Any chance patsy
could support to use dataframe index? Thanks.
Metadata
Assignees
Labels
No labels