Skip to content

Cleaning Dirty Location Data

l0qii edited this page Feb 11, 2018 · 7 revisions

The approach:

1. Identify which fields in the spreadsheet containing location info

We identified the following 7 fields as containing location info: Oceans,

2. Extract these fields into individual text files for easier analysis

3. Generate lists of unique values for each field

4. Compare unique lists to dictionaries created from available resources

5. Attempt simple data correction

6. Document some statistics of how dirty the data is and how much can be easily fixed

7. Next steps

Clone this wiki locally