Hi osdatahub,
Flagging an early-stage Python package in case it's useful context for you, and to ask for a steer on whether it's worth developing further in the direction I'm going.
ukgeo (v0.5, alpha) is a UK free-text geocoder for messy location strings — STATS19-style road references, motorway junctions, colloquial place names. It's built on OS Open Names and OS Open Roads (OGL-attributed), with postcodes.io and OSM filling gaps. Pip-installable, MIT licensed.
It came out of road-safety risk modelling work where the OS Names API wasn't the right shape for two specific reasons:
- Bulk batches. Your own product page notes the API isn't intended for bulk searches, and the 600/min live rate limit confirms it. We had hundreds of thousands of dirty STATS19 strings to resolve in one pass.
- Restricted-network environments. The analytical environment had no outbound API access, and we didn't want location strings leaving the network. ukgeo loads from a local parquet at startup and runs entirely offline. (Optional OS Names API fallback exists for long-tail infrastructure cases, off by default.)
Beyond those two, the things that have been useful in practice:
- Fuzzy multi-token matching on dirty strings (junction names, colloquial roundabouts, county-context disambiguation).
- Transparent output — every result returns confidence, level_resolved, match_type, candidates_considered, notes. Helpful for analyst triage of low-confidence rows.
- The pipeline is agnostic to feature type — the same scorer handles roads, junctions, places, stations. Extending coverage to the rest of the OS Open Names theme set (hospitals, schools, airports, ferries, etc.) is a parquet-build change rather than a code change.
What it isn't / current gaps, honestly:
- No reverse geocoding yet — planned, that's the biggest functional gap vs. OS Names.
- No address-level resolution. OS Places is the right tool for that.
- Data freshness depends on rebuilding parquets locally.
- Welsh / Gaelic / multilingual coverage uncertain in the current build.
- Test data is regional (Yorkshire / NW / Midlands); national-scale accuracy is partly assumption.
It works for us on two levels (the ORR pipeline and ad-hoc geocoding), and I think the offline / bulk angle could be useful for civil-service and research users who run into the same constraints. But it's early, and before investing further I'd value a steer on:
- Whether something like this overlaps with anything you're already planning or have seen demand for, and any improvements you'd suggest.
- Whether the "offline + bulk + dirty strings" framing matches a real gap you see from your end, or whether I'm pattern-matching off a narrow use-case.
No specific ask beyond that — happy if it's just noted.
Cheers,
Thomas
Hi osdatahub,
Flagging an early-stage Python package in case it's useful context for you, and to ask for a steer on whether it's worth developing further in the direction I'm going.
ukgeo (v0.5, alpha) is a UK free-text geocoder for messy location strings — STATS19-style road references, motorway junctions, colloquial place names. It's built on OS Open Names and OS Open Roads (OGL-attributed), with postcodes.io and OSM filling gaps. Pip-installable, MIT licensed.
It came out of road-safety risk modelling work where the OS Names API wasn't the right shape for two specific reasons:
Beyond those two, the things that have been useful in practice:
What it isn't / current gaps, honestly:
It works for us on two levels (the ORR pipeline and ad-hoc geocoding), and I think the offline / bulk angle could be useful for civil-service and research users who run into the same constraints. But it's early, and before investing further I'd value a steer on:
No specific ask beyond that — happy if it's just noted.
Cheers,
Thomas