ukgeo: steer wanted on a community geocoder built on OS Open Names + Open Roads

Hi osdatahub,

Flagging an early-stage Python package in case it's useful context for you, and to ask for a steer on whether it's worth developing further in the direction I'm going.

ukgeo (v0.5, alpha) is a UK free-text geocoder for messy location strings — STATS19-style road references, motorway junctions, colloquial place names. It's built on OS Open Names and OS Open Roads (OGL-attributed), with postcodes.io and OSM filling gaps. Pip-installable, MIT licensed.

- PyPI: https://pypi.org/project/ukgeo/
- Repo: https://github.com/ThomasHSimm/ukgeo
- Basic usage: https://openroadrisk.org/tools/ukgeo.html

It came out of road-safety risk modelling work where the OS Names API wasn't the right shape for two specific reasons:

1. Bulk batches. Your own product page notes the API isn't intended for bulk searches, and the 600/min live rate limit confirms it. We had hundreds of thousands of dirty STATS19 strings to resolve in one pass.
2. Restricted-network environments. The analytical environment had no outbound API access, and we didn't want location strings leaving the network. ukgeo loads from a local parquet at startup and runs entirely offline. (Optional OS Names API fallback exists for long-tail infrastructure cases, off by default.)

Beyond those two, the things that have been useful in practice:

- Fuzzy multi-token matching on dirty strings (junction names, colloquial roundabouts, county-context disambiguation).
- Transparent output — every result returns confidence, level_resolved, match_type, candidates_considered, notes. Helpful for analyst triage of low-confidence rows.
- The pipeline is agnostic to feature type — the same scorer handles roads, junctions, places, stations. Extending coverage to the rest of the OS Open Names theme set (hospitals, schools, airports, ferries, etc.) is a parquet-build change rather than a code change.

What it isn't / current gaps, honestly:

- No reverse geocoding yet — planned, that's the biggest functional gap vs. OS Names.
- No address-level resolution. OS Places is the right tool for that.
- Data freshness depends on rebuilding parquets locally.
- Welsh / Gaelic / multilingual coverage uncertain in the current build.
- Test data is regional (Yorkshire / NW / Midlands); national-scale accuracy is partly assumption.

It works for us on two levels (the ORR pipeline and ad-hoc geocoding), and I think the offline / bulk angle could be useful for civil-service and research users who run into the same constraints. But it's early, and before investing further I'd value a steer on:

- Whether something like this overlaps with anything you're already planning or have seen demand for, and any improvements you'd suggest.
- Whether the "offline + bulk + dirty strings" framing matches a real gap you see from your end, or whether I'm pattern-matching off a narrow use-case.

No specific ask beyond that — happy if it's just noted.
Cheers,
Thomas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ukgeo: steer wanted on a community geocoder built on OS Open Names + Open Roads #132

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ukgeo: steer wanted on a community geocoder built on OS Open Names + Open Roads #132

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions