Skip to content

[FEATURE] Create a blog post to explain the structure of an IPUMS extract #26

Open
@00krishna

Description

Issue Description

Write a blog post that explains the structure of basic IPUMS data extracts. IPUMS extracts come in different forms. The most basic extract form involves downloading a DDI (.xml) file and a data DAT compressed archive. The DDI file contains metadata about the variables in the extract--such as the variable names, data types, data ranges, etc. The DAT file contains only a fixed width format of numbers--never text.

The second type of extract is the NHGIS file, which contains a shapefile (.shp) containing both the GIS map of the selected geometries (city, state, county, etc) and data variables, and a CSV file containing just the variable information per geometric unit.

The post should explain the format of the extracts and the information contained in each component. The intent is that in subsequent blog posts, the author can explain the code for extracting information from these files without having to explain the structure of the extract at the same time.

Difficulty: Beginner

Time: 6 - 8 hours

Requirements

  • Explain the different types of data downloads: DDI + DAT and NHGIS format
  • Explain the contents of the DDI file and what is meant by metadata: such as data types, names, and special characters used in the data.
  • Explain that the DAT file is a fixed with numeric format and needs to be parsed with the metadata. That is the reason why packages such as IPUMS.jl are necessary.
  • Explain the meaning of special characters such as the missing data or Not-In-The-Universe characters that are part of the metadata.
  • Explain the components of an NHGIS extract including the separate shapefile and CSV file.
  • Explain that these data extract can be downloaded from the IPUMS website or they can be downloaded through the IPUMS.jl function.

Expected Outcomes

The anticipated outcome is a blog post, written in Markdown, that contains the elements listed above. This blog post is more informative and non-technical, so there is no reason to show a lot of code. Using code and the IPUMS.jl package will come in a subsequent blog post.

Additional Notes

Additional information about the structure of IPUMS extracts is available on the IPUMS website. Some good sources of information include.

Other Resources

Julia Slack:

  • documentation channel - you should post here first
  • helpdesk channel - this would be to get more attention to your issue but maybe not as precise as you need.
  • health-and-medicine channel - this is where most of JuliaHealth is located these days.

Julia Discourse - I would advise posting here if you have an issue that you feel is long or requires a lot of time to explain as you might lose it within Julia Slack. Consider cross-posting your forum post to the Julia Slack in helpdesk and/or documentation.

Metadata

Labels

documentationImprovements or additions to documentationenhancementNew feature or requestgood first issueGood for newcomershelp wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions