UK address utility based on machine learning and optimised search to parse, standardise, and compare addresses.
Address NER tagger is trained using crfsuite with help of 2 million uk housing addresses.
pip install ukaddresskitTagger
from ukaddresskit.parser import tag
print(tag("10 Downing Street SW1A 2AA"))Output
{'BuildingNumber': '10', 'Locality': 'DOWNING', 'TownName': 'STREET', 'Postcode': 'SW1A 2AA'}Postcode Helpers
from ukaddresskit.postcode import *
normalize_postcode("sw1a2aa") # "SW1A 2AA"
get_town("SW1A 2AA") # "LONDON"
get_county("SW1A 2AA") # "Greater London" (if in mapping)
get_county("SW1A 2AA")
get_locality(postcode: str)
get_streets(postcode: str)
get_property_mix(postcode: str) -> Dict[str, float]
---
from ukaddresskit.locality import *
get_town_by_locality("Ab Kettleby") -> "MELTON MOWBRAY"
get_town_by_locality("Abberton", ambiguity="all") -> ["COLCHESTER", "PERSHORE"]
list_towns_for_locality("Abberton") -> ["COLCHESTER", "PERSHORE"]- Add outcode_to_county.csv into lookups
- Fix bugs in library not loading on Colab
- Create postcode fill utility
- get_town(postcode)
- get_county(postcode)
- get_locality(postcode)
- get_streets(postcode) → array of street names
- get_property_mix(postcode)
- add test cases
- Create address populate utility (add missing address components - town, county, etc)
- Create address linkage utility / comparing
- Define test cases, organise code
- Improve machine learning models
- Create .parquet sqlite storage, indexes for optimal searches
- Create online docs
- Improve Address Parser
AddressParser (Pre & Post processing -- needs testing)
import pandas as pd
from ukaddresskit.pipeline import AddressParser
ap = AddressParser()
df = pd.DataFrame({"ADDRESS": [
"Flat 2, 10 Queen Street, Bury BL8 1JG",
]})
out = ap.parse(df)
fields = [
"SubBuildingName", "BuildingName", "BuildingNumber",
"StreetName", "Locality", "TownName", "Postcode", "County",
"PAOstartNumber", "PAOendNumber", "PAOstartSuffix", "PAOendSuffix",
"SAOStartNumber", "SAOEndNumber", "SAOStartSuffix", "SAOEndSuffix",
]
for i, row in out.iterrows():
print(f"\nAddress #{i}")
for col in fields:
val = row.get(col)
if pd.notna(val) and str(val) != "":
print(f" {col:16} {val}")Output
Address #0
SubBuildingName FLAT 2
BuildingNumber 10
StreetName QUEEN STREET
TownName BURY
Postcode BL81JG
PAOstartNumber 10.0
SAOStartNumber 2