Skip to content

investigate ambiguous parsing of the -burg suffix in NL/DE #152

Open
@missinglink

Description

Today we are merging pelias/api#1565 which brings a bunch of pelias/parser changes into pelias/api.

As part of this process we did some wider acceptance test checks and diff'd them against the current baseline.

One change which was identified was this query (at partial completion "grolmanstrasse 51, charlottenburg") which identifies the Berlin borough charlottenburg as a street.

 grolmanstrasse 51, charlottenburg, berlin
-FFFFFFFFFFFFFFFF0000000000000000000000000
+FFFFFFFFFFFFFFFF0000000000000000FFFF0FFF0

This was likely introduced in the recent NL work #126.

I would like to see if we can find a better way of handling the ambiguities between German and Dutch for the -burg suffix.

note: the correct solution is also being generated, but they both score the same, this scoring is based on matched token length so a robust fix would need to work equally well in cases where the len(street) < len(borough) as len(street) > len(borough) and len(street) == len(borough)

================================================================
SOLUTIONS (2ms)
----------------------------------------------------------------
(0.53) ➜ [ { housenumber: '51' }, { street: 'Charlottenburg' } ]

(0.53) ➜ [ { street: 'Grolmanstrasse' }, { housenumber: '51' } ]

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions