The goal is to create an accurate, atomic unique identifier for every physical structure in the world. Unique ID aims to be the single source of truth on property identity.
Phase 1
During Phase I of the project we will identify relevant address data sources and work to integrate them into a single source of truth data store. Twhich represents validated, real world addresses across the United States. First we will build the data infrastructure to handle addressess across the U.S. After a proof-of-concept on a representative sample of addresses is successful, we will build out the system for the entire U.S. (and eventually, the world).
Phase 2
In Phase 2, we will use our single source of truth database created during Phase 1 to pair address data with satelite--and other non-traditional sources.
They have 478m addresses globally and for freely available download. Crowd-sourced.
Openaddressess is a good starting data source
- Load data files into "raw" data sources db
- Write ETL script to load into "raw" data sources DB, scrub/clean data as needed, and load into single source of truth
Primary usage will be to store addresses in single source of truth db. Geocoding services seem cheaper than google so might be a good commercial option to fill in the gaps.
- Planet OSM
- Planet OSM - Downloading
- BBBike
- geofabrik.de
- Mapzen - commercial serivice
- OpenStreetMap direct Database and API access
Commercial Options
- Download data from openstreetmap.org and review for features, completeness, and accuracy.
- Write ETL script to load into "raw" data sources DB, scrub/clean data as needed, and load into single source of truth
- Store Places API data as a data source
- Geocoding & Reverse Geocoding API to fill in gaps in other datasets
- Model out pricing/cost for use case
- If cost effective compared to free/open data sources, negotiate Enterprise License so we can cache more than 30 days of data
- Store listing address data for single source of truth db.
- Store review and other unstructured data as potential indicator of misclassification of buildings etc.
- Review documentation/Yelp TOS to understand how much data we can legally cache
- Model out pricing/cost for use case
A listing database of launched satellites
Find inexpensive and suitable satelite data sources for use in Phase 2.
Research which satelite data will provide most value to phase 2 of the project
- Address parsing and normalization through libpostal
- geographic addresses format templates