GitHub · Pilot evidence site · OSM community discussion · MIT licensed
Local tool that reads Toronto address points from the sibling
toronto-addresses-import project's SQLite DB,
conflates them against live OSM data, routes questionable items to a human
reviewer via a web UI, and uploads approved batches to the OpenStreetMap
dev sandbox (master.apis.dev.openstreetmap.org). Every auto and manual
action is written to an append-only audit log.
Live status of the import proposal against the OSM Import Guidelines workflow:
| Stage | State |
|---|---|
| Draft proposal | Complete (last revised 2026-04-29) |
| OSM Community Forum discussion | Open — thread |
Wiki page (Toronto/Import/AddressPoints) |
Not yet published |
imports@openstreetmap.org announcement |
Not yet posted |
| Two-week feedback window | Not yet started |
| Phase 1 pilot upload (production) | Blocked on the above |
All uploads from this tool to date have used the OSM dev sandbox (master.apis.dev.openstreetmap.org). No production edits have been made and none will be made until the proposal has cleared the customary feedback window. The OSM account used for production uploads will be skfd imports (dedicated, not the maintainer's personal account).
Candidate and AddressMatch are synonyms — both refer to one row from
the input CSV paired with its OSM lookup result, the unit flowing through the
pipeline. Code, DB schema, and templates use candidate; discussion and new
docs may use either term. Each one carries three orthogonal axes:
verdict— what conflation decided (MATCH,MATCH_FAR,MISSING,SKIPPED)status— what the operator decided (OPEN,APPROVED,REJECTED,DEFERRED);AUTO_APPROVEDis a synthetic status the review queue derives for clean MISSING rows that bypass manual reviewstage— where it sits in the pipeline (INGESTED,CONFLATED,CHECKED,REVIEW_PENDING,APPROVED,REJECTED,BATCHED,UPLOADED,FAILED,SKIPPED)
A Run is one execution of the pipeline (produces many candidates); a
Batch is a bundle of APPROVED candidates packaged for upload.
- Python 3.11+ (uses
tomllib). - From the project root:
python -m venv .venv .venv\Scripts\activate # PowerShell / cmd pip install -e .
- Register an OAuth2 application on the OSM dev server:
- Log into https://master.apis.dev.openstreetmap.org/.
- My Settings → OAuth 2 applications → Register new application.
- Name: anything (e.g.
t2-address-import-dev). - Redirect URI:
http://localhost:5000/oauth/callback - Permissions: tick read user preferences, modify the map, comment on changesets.
- Save; copy the resulting Client ID and Client Secret.
- Create
.env(copy.env.example) and fill in:OSM_CLIENT_ID=... OSM_CLIENT_SECRET=... FLASK_SECRET_KEY=<any random string> FERNET_KEY=<generate with: python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"> - Adjust
config.tomlif your sibling DB lives somewhere else or you want a different default bbox.
python run.pyThen visit http://localhost:5000/.
The tool defaults to the OSM dev sandbox
(master.apis.dev.openstreetmap.org). The header shows a DEV / PROD
badge so you always know which server uploads will go to.
Switch by setting OSM_API_BASE:
- DEV (default):
OSM_API_BASE=https://master.apis.dev.openstreetmap.org - PROD (real OSM):
OSM_API_BASE=https://api.openstreetmap.org
Each server has its own OAuth2 application registry, so a prod run also needs
a prod-side OSM_CLIENT_ID / OSM_CLIENT_SECRET — register a second app on
https://www.openstreetmap.org/oauth2/applications with the same redirect URI.
To launch against a non-default target, set the env var inline (this wins
over .env, which uses setdefault):
# PowerShell — prod
$env:OSM_API_BASE="https://api.openstreetmap.org"
$env:OSM_CLIENT_ID="<prod-client-id>"
$env:OSM_CLIENT_SECRET="<prod-client-secret>"
python run.py# bash — prod
OSM_API_BASE=https://api.openstreetmap.org \
OSM_CLIENT_ID=<prod-client-id> \
OSM_CLIENT_SECRET=<prod-client-secret> \
python run.pyThe Geofabrik extract (Stage 2 read source) is the same in both modes — there is no dev-server slice from Geofabrik, and the dev sandbox has no realistic Toronto data anyway. Only the upload target changes.
Stage 2 reads addresses from a locally-cached Toronto extract instead of querying Overpass every time. First-time setup:
python -m t2.osm_refreshThis downloads the latest Ontario PBF from Geofabrik (~600 MB) into
data/osm/ontario-latest.osm.pbf, filters it to addr:housenumber-tagged
features clipped to the City-of-Toronto bbox in config.toml, and writes
data/osm/toronto-addresses.json + a meta.json sidecar. Stage 2 then just
bbox-clips that JSON per run — no network, sub-second.
Re-run whenever you want a fresher snapshot. The tool HEAD-checks Geofabrik
and skips the download if Last-Modified hasn't changed; pass --force to
re-download regardless. --dry-run does only the HEAD check.
You can also trigger a refresh from the web UI at http://localhost:5000/osm.
The page shows the extract's freshness, element counts, sha256s, and tails
data/osm/refresh.log so you can watch progress. The button spawns the same
CLI as a detached subprocess, so Flask stays responsive while the download
runs.
To fall back to live Overpass queries (e.g. bbox experiments outside
Toronto), set [osm] source = "overpass" in config.toml.
Toronto is too big to pick by typing lat/lon, so the tool precomputes a tile layer you can click on. Generate it once with:
python -m t2.tiles_buildThis downloads the City of Toronto's 158-neighbourhood polygon layer from
Open Data, counts active
source addresses inside each polygon, and quadtree-splits any neighbourhood
with more than 500 addresses. The result (typically ~2,500 tiles) lands in
data/tiles.json + a data/tiles/meta.json sidecar. Regenerate when a new
source snapshot lands.
The dashboard's Pick on map button opens /map — click any tile to land
on its detail page, which lists prior runs on that tile and has a "Start new
run" form pre-filled with the tile's bbox. The manual bbox form on the
dashboard remains as an escape hatch for arbitrary rectangles.
- Create a run from the dashboard. Either Pick on map and click a
tile, or type a small downtown rectangle like
(43.645, -79.42, 43.665, -79.39)into the bbox form. - On the run page, click the four pipeline buttons in order: Ingest → Fetch OSM → Conflate → Run checks.
- Open the Review queue — items flagged by any enabled check land here. Approve, reject, or defer each. MISSING candidates with no flags are auto-approved; MATCH candidates are auto-skipped.
- Back on the run page, Compose batch (mode
josm_xmlorosm_api, size up to 500 for first run). - On the batch page:
Export .osm (JOSM)writesdata/batch_<id>.osm. Open it in JOSM, then upload via JOSM's own auth.Upload via OSM APIopens a changeset on the dev server, uploads the osmChange diff, and closes the changeset. Visit/oauth/startfirst if you haven't authorized yet.
- The Audit log at
/runs/<id>/auditshows every event.
Every candidate has a stage column. Killing the process mid-run and
restarting is safe — each stage skips work already done:
- Re-running Ingest only adds new rows (
INSERT OR IGNORE). - Re-running Fetch reuses the cached
data/osm_current_run<id>.json. - Re-running Conflate resumes from any candidate still at
INGESTED. - Re-running Checks skips any
(candidate, check_id, check_version)that already has a result row. Bump a check'sversionin code to force rerun. - Uploads look up prior changesets by their
import:client_tokentag before opening a new one.
Match targets are pure address nodes (addr:housenumber + no POI tags) and
polygons (ways/relations with an address — typically buildings, including
amenity-tagged footprints like a hospital).
POI nodes (nodes carrying amenity, shop, office, tourism, leisure,
craft, healthcare, building, plus disused:* / was:* variants — see
POI_TAG_KEYS in t2/conflate.py) are ignored for matching: their address
is a courtesy annotation, not the canonical address feature. When a POI sits at
a MISSING candidate's address, the review UI acknowledges it with a pill, and
any addr:postcode on the POI is copied into the proposed upload tags.
Even after that filter, a matched "pure address" node can quietly carry
non-address tags (name, ref, entrance). The potential_amenity check
flags those with severity=info so we can refine the POI filter over time.
Metadata keys like source, opendata:type, check_date, note are on an
ignore list inside the check and don't trigger it.
The current pipeline is one-directional: Toronto source → OSM lookup → upload additions. Two cleanup flows in the opposite direction are explicitly out of scope and left for a later phase. Documented here so reviewers don't assume they were overlooked.
If OSM has an address that Toronto's active snapshot doesn't, we do not flag, propose, or remove it.
Reasoning — the absence direction is asymmetric. Toronto's open data is authoritative when it asserts an address exists; silence is a weaker signal. The feed has refresh lag, known-missing neighborhoods, and retired-address states that aren't cleanly separable from "never existed." Deleting OSM data based on absence alone would destroy real addresses on worse evidence than we accept for additions.
A future phase would need, at minimum: a reverse-sweep stage enumerating OSM
addresses in the run bbox; a separate review queue (not Candidate — the
verdicts don't fit); a street-level cross-check to suppress the common case
where Toronto's feed is missing a whole street; prioritization by OSM metadata
(start_date, last-edit age, source); and human-only approval — no
automation, since OSM deletions are high blast radius and hard to reverse.
OSM addr:interpolation ways synthesize housenumbers along a street segment
between two endpoint nodes. When Toronto's per-address points cover the same
segment with real data, the interpolation way is technically redundant. We
still don't touch them.
Reasoning — an interpolation way isn't an address, it's a geometry-anchored range declaration. Our matching model (housenumber + street + point) doesn't describe what's being replaced. Replacement needs cross-validation: every integer in the interpolation range must have a real Toronto point before removal, otherwise the delete leaves mapped gaps. It's also a bulk structural edit to OSM, not an address-import operation — different review bar, different changeset hygiene, different rollback story than what this tool was built for.
A future phase would need: enumeration of addr:interpolation ways in the
bbox; coverage check that every integer in the range has a colocated Toronto
point; a proposed delete-way-plus-preserve-endpoints changeset for human
review; and care around tags (addr:street, addr:postcode) that the
interpolation way carries on behalf of its endpoints.
The shipping scope — "get Toronto's missing civic addresses into OSM without creating duplicates" — has standalone value. Folding cleanup into the same pipeline expands blast radius and review burden without proportional benefit, and the two reverse flows have different enough semantics (different data sources, different review criteria, different failure modes) that they deserve their own pipelines when we get to them.
- Create
t2/checks/<name>.pyexporting a class that matches theCheckprotocol int2/checks/base.py. - Register it in
t2/checks/__init__.py. - Restart the app. The new check appears in the run's toggle list.
This tool moves data between three open datasets. Downstream uploads inherit OSM's licence, but the upstream sources each have their own terms:
- Toronto Open Data — "Address Points (Municipal) – Toronto One Address Repository", published under the Open Government Licence – Toronto. Consumed indirectly via the sibling
toronto-addresses-importproject. - OpenStreetMap — © OpenStreetMap contributors, ODbL 1.0. All uploads target the OSM dev sandbox (
master.apis.dev.openstreetmap.org); any future production import must separately comply with the OSMF import guidelines and contributor terms. - Geofabrik — Ontario
.osm.pbfextracts, redistributed under ODbL from OSM.
MIT — see LICENSE.