-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Closes #21, #22 and #23 (copied below), #27.
Update from 2023
Stop updating the data, really.
- 'Freeze' as it is
(except for ESS, perhaps) - Archive the original freezed datasets/codebooks in
data-raw/ - Update
srqm_datato usedata-raw/ - Slightly improve the
_readmedocuments- Document freezes
- Document codebook issues, e.g. QOG 2020: make sure GDP documentation has been corrected #27
- Ideally, this would be in the Stata Guide…
- Add WEP? Country-level data: World Economics and Politics Dataverse #24
Detailed notes
- QOG:
-- since QOG 2023 is outqog2023- freeze:
qog2019 - would require rewriting code and looking at less clear results… see code at end of section
- only advantage would be lower codebook size → just downsample the 2019 one, it only loses the intra-doc links
- note the codebook issue! QOG 2020: make sure GDP documentation has been corrected #27
- Perhaps simply drop the
eu_*variables
- freeze:
- GSS:
-- since GSS has updated toogss7221- freeze:
gss7616(but see below) - not fun to keep only one year: keep
older yearsone old year too possibly break down single data into yearly ones?restrict to 1976 and 2016- would solve "max 2,048 vars" issue from Compatibility with different versions of Stata #28
raises question as to how to zip it all (currently usesgss7616*to match files)
- freeze:
- ESS:
-- in order to continue using torture question?ess2008- freeze:
ess0816, oress2008andess2016(different codebooks, so it's fine) - keep using Round 4 for both torture example and health services ones (results are not as clear-cut with Round 8(
- keep Round 8 to cover e.g. climate change
- problem: DTA file is too large -- divide, to avoid
_mergeproblem - document existence of
ess2016despite not in use anywhere in the course do-files
- freeze:
- WVS:
wvs9904-- keep old version for sharia law question- update to last version, check encoding
- possibly also include a more recent wave? (raises same question as
ess2016)
- NHIS: update to
nhis202*recent yearnhis1020?- check if sampling frame and variables have changed first
- see below on how URL structure for fetching has changed
Note on QOG -- offers only this as a replacement in 2023, which is not ideal:
// school life expectancy
sc wdi_fertility wef_lse, ms(i) mlab(ccodealp) || lfit wdi_fertility wef_lse, ///
name(g1, replace)
// linear fit + SSA data points only, underpredicted
sc wdi_fertility wef_lse if ht_region == 4, ms(i) mlab(ccodealp) || ///
lfit wdi_fertility wef_lse, ///
name(g2, replace)
// all regions
forv i = 1/10 {
sc wdi_fertility wef_lse if ht_region == `i', ms(i) mlab(ccodealp) || ///
lfit wdi_fertility wef_lse, ///
name("region`i'", replace)
}The plan for 2021:
- Redraw table of use in do-files, to check they are all used a fair number of times.
- Students actually need this to see the data in use.
- Update QOG to January 2021 (2017± 3 years). This will also fix a codebook issue (QOG 2020: make sure GDP documentation has been corrected #27).
- Update GSS to 2018.
- http://gss.norc.org/get-the-data/stata
- Nice example use: https://kieranhealy.org/blog/archives/2019/03/22/a-quick-and-tidy-look-at-the-2018-gss/
- Have only one year? Also include e.g. 2008?
- Rewrite
week12.do.
- Update ESS
- https://www.europeansocialsurvey.org/data/round-index.html
- Round 9 (2018) is out.
- Check results on
week6.do(which uses Round 4 only right now, despitetrrtortalso existing for Round 8). - Have only Round 4? (interview dates 2008–2010)
- Call it
ess0810— note: in previous course versions,ess0810contained Rounds 4 (2008) and 5 (2010)
- Call it
- Update NHIS to 2010 + 2019 (?)
- https://www.cdc.gov/nchs/nhis/2019nhis.htm
- Year 2019 is out, BUT filenames differ —
- Names in 2019:
ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Program_Code/NHIS/2019/ - Names in 2018 (and before up to 2010):
ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Program_Code/NHIS/2018/
- Names in 2019:
- Update WVS Round 4 to 2020 version
- Check results and encoding issue in variable label (update: still there in Stata 13, not Stata 14+).
Additional things to consider:
Dataset names
I like the initial "acronym + year" convention, but it produces strange names for multiple-year survey datasets:
ess1214(not used) andess0816wvs9904(unavoidable)nhis1017(unavoidable, unless we use a single year, but that removes any demo ofkeep if year)gss7616(unavoidable, unless we separate the years)
Merged datasets
Is it still a good idea to do that for e.g. ESS? Probably not, esp. if we need to limit datasets at 2,048 variables for Stata/IC.
- Keep NHIS with multiple years. Use it to demo
keep if year. - Keep WVS with multiple years (country-dependent).
- Break down GSS.
- Break down ESS.
Both WVS and ESS are used to demo keep if inlist(country, …), the other subset we want to show.
Additional datasets
It would make a lot of sense to have more datasets for the students to use than those used in the do-files.
Currently, the do-files are selective anyway: we provide ESS 2016 (Round 8) but do not use the data, even though the dependent variable also exists in that round.
- GSS has a single codebook, so bundling many years would duplicate the codebook in the ZIP archives. Not ideal.
- ESS could be broken down to Rounds 4 (2008), 8 (2016) and 9 (2018).