Skip to content

UK address utility based on machine learning and optimised search to parse, standardise, and compare addresses.

License

Notifications You must be signed in to change notification settings

reachusama/ukaddresskit

Repository files navigation

ukaddresskit

CI PyPI version Downloads

UK address utility based on machine learning and optimised search to parse, standardise, and compare addresses.

Address NER tagger is trained using crfsuite with help of 2 million uk housing addresses.

Install - alpha stage

pip install ukaddresskit

Quick Start

Tagger

from ukaddresskit.parser import tag

print(tag("10 Downing Street SW1A 2AA"))

Output

{'BuildingNumber': '10', 'Locality': 'DOWNING', 'TownName': 'STREET', 'Postcode': 'SW1A 2AA'}

Postcode Helpers

from ukaddresskit.postcode import *

normalize_postcode("sw1a2aa")  # "SW1A 2AA"
get_town("SW1A 2AA")      # "LONDON"
get_county("SW1A 2AA")         # "Greater London" (if in mapping)
get_county("SW1A 2AA") 
get_locality(postcode: str)
get_streets(postcode: str)
get_property_mix(postcode: str) -> Dict[str, float]

---

from ukaddresskit.locality import *

get_town_by_locality("Ab Kettleby")                 -> "MELTON MOWBRAY"
get_town_by_locality("Abberton", ambiguity="all")   -> ["COLCHESTER", "PERSHORE"]
list_towns_for_locality("Abberton")                 -> ["COLCHESTER", "PERSHORE"]

Todo

  • Add outcode_to_county.csv into lookups
  • Fix bugs in library not loading on Colab
  • Create postcode fill utility
    • get_town(postcode)
    • get_county(postcode)
    • get_locality(postcode)
    • get_streets(postcode) → array of street names
    • get_property_mix(postcode)
    • add test cases
  • Create address populate utility (add missing address components - town, county, etc)
  • Create address linkage utility / comparing
  • Define test cases, organise code
  • Improve machine learning models
  • Create .parquet sqlite storage, indexes for optimal searches
  • Create online docs
  • Improve Address Parser

AddressParser (Pre & Post processing -- needs testing)

import pandas as pd
from ukaddresskit.pipeline import AddressParser

ap = AddressParser()
df = pd.DataFrame({"ADDRESS": [
    "Flat 2, 10 Queen Street, Bury BL8 1JG",
]})
out = ap.parse(df)
fields = [
    "SubBuildingName", "BuildingName", "BuildingNumber",
    "StreetName", "Locality", "TownName", "Postcode", "County",
    "PAOstartNumber", "PAOendNumber", "PAOstartSuffix", "PAOendSuffix",
    "SAOStartNumber", "SAOEndNumber", "SAOStartSuffix", "SAOEndSuffix",
]

for i, row in out.iterrows():
    print(f"\nAddress #{i}")
    for col in fields:
        val = row.get(col)
        if pd.notna(val) and str(val) != "":
            print(f"  {col:16} {val}")

Output

Address #0
  SubBuildingName  FLAT 2
  BuildingNumber   10
  StreetName       QUEEN STREET
  TownName         BURY
  Postcode         BL81JG
  PAOstartNumber   10.0
  SAOStartNumber   2

About

UK address utility based on machine learning and optimised search to parse, standardise, and compare addresses.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published