Dataset updates

Closes #21, #22 and #23 (copied below), #27.

## Update from 2023

Stop updating the data, really.

- [ ] 'Freeze' as it is ~~(except for ESS, perhaps)~~
- [ ] Archive the original freezed datasets/codebooks in `data-raw/`
- [ ] Update `srqm_data` to use `data-raw/`
- [ ] Slightly improve the `_readme` documents
  - [ ] Document freezes
  - [ ] Document codebook issues, e.g. #27 
  - [ ] Ideally, this would be in the Stata Guide…
- [ ] Add WEP? #24 

Detailed notes

- __QOG__: ~~`qog2023`~~ -- since QOG 2023 is out
  - freeze: `qog2019`
  - would require rewriting code and looking at less clear results… see code at end of section
  - only advantage would be lower codebook size → just downsample the 2019 one, it only loses the intra-doc links
  - note the codebook issue! #27 
  - Perhaps simply drop the `eu_*` variables
- __GSS__: ~~`gss7221`~~ -- since GSS has [updated too](https://gss.norc.org/get-the-data/stata)
  - freeze: `gss7616` (but see below)
  - not fun to keep only one year: keep ~~older years~~ one old year too
  - ~~possibly break down single data into yearly ones?~~ restrict to 1976 and 2016
    - would solve "max 2,048 vars" issue from #28 
    - ~~raises question as to how to zip it all (currently uses `gss7616*` to match files)~~
- __ESS__: ~~`ess2008`~~ -- in order to continue using torture question?
  - freeze: `ess0816`, or `ess2008` and `ess2016` (different codebooks, so it's fine)
  - keep using Round 4 for both torture example and health services ones (results are not as clear-cut with Round 8(
  - keep Round 8 to cover e.g. climate change
  - problem: DTA file is too large -- divide, to avoid `_merge` problem
  - document existence of `ess2016` despite not in use anywhere in the course do-files
- __WVS__: `wvs9904` -- keep old version for sharia law question
  - update to last version, check encoding
  - possibly also include a more recent wave? (raises same question as `ess2016`)
- __NHIS__: update to ~~`nhis202*` recent year~~ `nhis1020`?
  - check if sampling frame and variables have changed first
  - see below on how URL structure for fetching has changed

Note on QOG -- offers only this as a replacement in 2023, which is not ideal:

```stata
// school life expectancy
sc wdi_fertility wef_lse, ms(i) mlab(ccodealp) || lfit wdi_fertility wef_lse, ///
	name(g1, replace)
// linear fit + SSA data points only, underpredicted
sc wdi_fertility wef_lse if ht_region == 4, ms(i) mlab(ccodealp) || ///
	lfit wdi_fertility wef_lse, ///
	name(g2, replace)
// all regions
forv i = 1/10 {
	sc wdi_fertility wef_lse if ht_region == `i', ms(i) mlab(ccodealp) || ///
	lfit wdi_fertility wef_lse, ///
	name("region`i'", replace)
}
```

## The plan for 2021:  

- [ ] Redraw table of use in do-files, to check they are all used a fair number of times.
  - Students actually need this to see the data in use.
- [ ] Update __QOG__ to January 2021 (2017± 3 years). This will also fix a codebook issue (#27).
  - https://www.gu.se/en/quality-government/qog-data/data-downloads/standard-dataset
- [ ] Update __GSS__ to 2018.
  - http://gss.norc.org/get-the-data/stata
  - Nice example use: https://kieranhealy.org/blog/archives/2019/03/22/a-quick-and-tidy-look-at-the-2018-gss/
  - Have only one year? Also include e.g. 2008?
  - [ ] Rewrite `week12.do`.
- [ ] Update __ESS__
  - https://www.europeansocialsurvey.org/data/round-index.html
  - [ ] Round 9 (2018) is out.
  - [ ] Check results on `week6.do` (which uses Round 4 only right now, despite `trrtort` also existing for Round 8).
  - [ ] Have only Round 4? (interview dates 2008–2010)
    - [ ] Call it `ess0810` — note: in previous course versions, `ess0810` contained Rounds 4 (2008) and 5 (2010)
- [ ] Update __NHIS__ to 2010 + [2019](https://www.cdc.gov/nchs/nhis/2019nhis.htm) (?)
  - https://www.cdc.gov/nchs/nhis/2019nhis.htm
  - Year 2019 is out, __BUT__ filenames differ — 
    - Names in 2019: `ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Program_Code/NHIS/2019/`
    - Names in 2018 (and before up to 2010): `ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Program_Code/NHIS/2018/`
- [ ] Update __WVS__ Round 4 to 2020 version
  - [ ] Check results and encoding issue in variable label (update: still there in Stata 13, not Stata 14+).

Additional things to consider:

## Dataset names

I like the initial "acronym + year" convention, but it produces strange names for multiple-year survey datasets:

- `ess1214` (not used) and `ess0816`
- `wvs9904` (unavoidable)
- `nhis1017` (unavoidable, unless we use a single year, but that removes any demo of `keep if year`)
- `gss7616` (unavoidable, unless we separate the years)

## Merged datasets

Is it still a good idea to do that for e.g. ESS? Probably not, esp. if we need to limit datasets at 2,048 variables for Stata/IC.

- [ ] Keep NHIS with multiple years. Use it to demo `keep if year`.
- [ ] Keep WVS with multiple years (country-dependent).
- [ ] Break down GSS.
- [ ] Break down ESS.

Both WVS and ESS are used to demo `keep if inlist(country, …)`, the other subset we want to show.

## Additional datasets

It would make a lot of sense to have more datasets for the students to use than those used in the do-files.

Currently, the do-files are selective anyway: we provide ESS 2016 (Round 8) but do not use the data, even though the dependent variable also exists in that round.

- GSS has a single codebook, so bundling many years would duplicate the codebook in the ZIP archives. Not ideal.
- ESS could be broken down to Rounds 4 (2008), 8 (2016) and 9 (2018).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataset updates #30

Update from 2023

The plan for 2021:

Dataset names

Merged datasets

Additional datasets

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Dataset updates #30

Description

Update from 2023

The plan for 2021:

Dataset names

Merged datasets

Additional datasets

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions