[FEATURE] Create a blog post to explain the structure of an IPUMS extract #26
Description
Issue Description
Write a blog post that explains the structure of basic IPUMS data extracts. IPUMS extracts come in different forms. The most basic extract form involves downloading a DDI (.xml) file and a data DAT compressed archive. The DDI file contains metadata about the variables in the extract--such as the variable names, data types, data ranges, etc. The DAT file contains only a fixed width format of numbers--never text.
The second type of extract is the NHGIS file, which contains a shapefile (.shp) containing both the GIS map of the selected geometries (city, state, county, etc) and data variables, and a CSV file containing just the variable information per geometric unit.
The post should explain the format of the extracts and the information contained in each component. The intent is that in subsequent blog posts, the author can explain the code for extracting information from these files without having to explain the structure of the extract at the same time.
Difficulty: Beginner
Time: 6 - 8 hours
Requirements
- Explain the different types of data downloads: DDI + DAT and NHGIS format
- Explain the contents of the DDI file and what is meant by metadata: such as data types, names, and special characters used in the data.
- Explain that the DAT file is a fixed with numeric format and needs to be parsed with the metadata. That is the reason why packages such as
IPUMS.jl
are necessary. - Explain the meaning of special characters such as the missing data or Not-In-The-Universe characters that are part of the metadata.
- Explain the components of an NHGIS extract including the separate shapefile and CSV file.
- Explain that these data extract can be downloaded from the IPUMS website or they can be downloaded through the
IPUMS.jl
function.
Expected Outcomes
The anticipated outcome is a blog post, written in Markdown, that contains the elements listed above. This blog post is more informative and non-technical, so there is no reason to show a lot of code. Using code and the IPUMS.jl
package will come in a subsequent blog post.
Additional Notes
Additional information about the structure of IPUMS extracts is available on the IPUMS website. Some good sources of information include.
- https://tech.popdata.org/ipumsr/articles/ipums-read.html
- https://usa.ipums.org/usa/extract_instructions.shtml
- https://cran.r-project.org/web/packages/ipumsr/vignettes/ipums-read.html
Other Resources
documentation
channel - you should post here firsthelpdesk
channel - this would be to get more attention to your issue but maybe not as precise as you need.health-and-medicine
channel - this is where most of JuliaHealth is located these days.
Julia Discourse - I would advise posting here if you have an issue that you feel is long or requires a lot of time to explain as you might lose it within Julia Slack. Consider cross-posting your forum post to the Julia Slack in helpdesk
and/or documentation
.