Skip to content

idea: subfield for address_parts.number with alpha tokens #502

Open
@missinglink

Description

The peliasHousenumber analyzer strips non-numeric tokens.

As discussed in pelias/pelias#810 this is somewhat unintuitive but actually works very well.

schema/settings.js

Lines 124 to 128 in 41bd2d1

"peliasHousenumber": {
"type": "custom",
"tokenizer":"standard",
"char_filter" : ["numeric"]
},

The issue with this is that the original housenumber (including alpha characters) is lost to the document, meaning we can't do later fine-grained sorting on it.

As a workaround we're using the phrase.default field to get access to those tokens.

The disadvantage of phrase.default is that it will contain tokens from both the street and the housenumber, potentially producing undesirable matches. For non-address queries it will also contain additional tokens.

In this issue I would like to float the idea of having a 'subfield' of address_parts.number, call it something like address_parts.number.raw and use a different analyzer on it, such as peliasUnit (which doesn't strip the alpha chars).

This would remain backwards compatible while also adding an additional field address_parts.number.raw which contains both alpha and numeric tokens.

The benefits would be that we can then target this 'raw' field directly in our queries to do unit number sorting, et al.

The only minor disadvantage would be that the new field would increase the index size on-disk, although I expect this to be insubstantial (<~1%).

Also, if we're not going to use it then there's no sense in adding it.

cc/ @orangejulius @ianthetechie @Joxit

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions