Description
The peliasHousenumber
analyzer strips non-numeric tokens.
As discussed in pelias/pelias#810 this is somewhat unintuitive but actually works very well.
Lines 124 to 128 in 41bd2d1
The issue with this is that the original housenumber (including alpha characters) is lost to the document, meaning we can't do later fine-grained sorting on it.
As a workaround we're using the phrase.default
field to get access to those tokens.
The disadvantage of phrase.default
is that it will contain tokens from both the street and the housenumber, potentially producing undesirable matches. For non-address queries it will also contain additional tokens.
In this issue I would like to float the idea of having a 'subfield' of address_parts.number
, call it something like address_parts.number.raw
and use a different analyzer on it, such as peliasUnit
(which doesn't strip the alpha chars).
This would remain backwards compatible while also adding an additional field address_parts.number.raw
which contains both alpha and numeric tokens.
The benefits would be that we can then target this 'raw' field directly in our queries to do unit number sorting, et al.
The only minor disadvantage would be that the new field would increase the index size on-disk, although I expect this to be insubstantial (<~1%).
Also, if we're not going to use it then there's no sense in adding it.
Activity