idea: subfield for `address_parts.number` with alpha tokens

The `peliasHousenumber` analyzer strips non-numeric tokens.

As discussed in https://github.com/pelias/pelias/issues/810 this is somewhat unintuitive but actually works very well.

https://github.com/pelias/schema/blob/41bd2d1daa0e924a7f73850202640ee7c3a1ad45/settings.js#L124-L128

The issue with this is that the original housenumber (including alpha characters) is lost to the document, meaning we can't do later fine-grained sorting on it.

As a workaround we're using [the `phrase.default` field](https://github.com/pelias/api/pull/1683) to get access to those tokens.

 The disadvantage of `phrase.default` is that it will contain tokens from *both* the street and the housenumber, potentially producing undesirable matches. For non-address queries it will also contain additional tokens.

In this issue I would like to float the idea of having a 'subfield' of `address_parts.number`, call it something like `address_parts.number.raw` and use a different analyzer on it, such as `peliasUnit` (which doesn't strip the alpha chars).

This would remain backwards compatible while also adding an additional field `address_parts.number.raw` which contains *both* alpha and numeric tokens.

The benefits would be that we can then target this 'raw' field directly in our queries to do unit number sorting, et al.

The only minor disadvantage would be that the new field would increase the index size on-disk, although I expect this to be insubstantial (<~1%).

Also, if we're not going to use it then there's no sense in adding it.

cc/ @orangejulius @ianthetechie @Joxit 

	"peliasHousenumber": {
	"type": "custom",
	"tokenizer":"standard",
	"char_filter" : ["numeric"]
	},

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

idea: subfield for `address_parts.number` with alpha tokens #502

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

idea: subfield for address_parts.number with alpha tokens #502

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

idea: subfield for `address_parts.number` with alpha tokens #502