-
Notifications
You must be signed in to change notification settings - Fork 239
Bug 1958992 - suggest: Improve geonames l10n and weather-suggestions matching. #6745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@0c0w3 - Hey Drew, just finished up some other stuff, will look into this next week. Sorry for the delay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks! The l10n handling is very cool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm 🙂
…matching. This is a substantial reworking of geonames and weather suggestions in suggest. Summary of major changes: In RS, don't store geonames' alternate names inline with the core geonames data. Instead, use separate record types. (As a reminder, "alternates" just means variants of a geoname's main name, like "NYC" and "NY" are alternates for New York City.) So now there are two record types: core geonames data and alternates. The core records contain the main geonames data: IDs, canonical name, country, admin divisions, etc., and they can be ingested by all clients regardless of their locale or country. The alternates records are scoped by language and are intended to be ingested only by clients with matching locales. Improve geonames fetching and weather-suggestion matching so all admin levels and countries are supported. e.g., "waterloo on", "waterloo canada", "waterloo on canada", etc. Relax the weather parsing a little to allow multiple weather keywords ("rain weather"). Keep track of all available admin codes per geoname. There are four of them. This is necessary because a lot of countries outside North America have multiple admin levels, and determining whether a given geoname is related to another one requires comparing their admin codes. Instead of manually computing name variants and inserting them separately into the DB, use a custom Sqlite collating sequence. ("Variants" here means removing punctuation, lowercasing, removing diacritics, etc.) Store each geoname's `ascii_name` as an alternate. That's useful for chars like "ö", which is represented as "oe" in the ASCII name (at least the geonames data I've seen). Minor changes: Store latitude and longitude and strings instead of floats. I made this change to derive `Eq` for `Geoname`, but it makes sense anyway and is how I should have done it originally. Add `Geoname::geoname_type` so consumers can easily understand whether it's a city, admin region, or country. Remove the `geoname_type` param from `fetch_geonames`. Consumers can filter out matching geonames that they don't want instead.
a9358ac
to
23d1ebd
Compare
Thanks! I'll wait to merge this until I can get a desktop patch together. I think I'll also need another PR where |
23d1ebd
to
b6d072d
Compare
The latest commit reverts the change from |
This is a substantial reworking of geonames and weather suggestions in suggest, including some breaking API changes. I didn't bother deprecating anything because AFAIK desktop is the only consumer that uses these, and we can just fix it when we vendor.
Summary of major changes:
In RS, don't store geonames' alternate names inline with the core geonames data. Instead, use separate record types. (As a reminder, "alternates" just means variants of a geoname's main name, like "NYC" and "NY" are alternates for New York City.) So now there are two record types: core geonames data and alternates. The core records contain the main geonames data: IDs, canonical name, country, admin divisions, etc., and they can be ingested by all clients regardless of their locale or country. The alternates records are scoped by language and are intended to be ingested only by clients with matching locales.
Improve geonames fetching and weather-suggestion matching so all admin levels and countries are supported. e.g., "waterloo on", "waterloo canada", "waterloo on canada", etc.
Relax the weather parsing a little to allow multiple weather keywords ("rain weather").
Keep track of all available admin codes per geoname. There are four of them. This is necessary because a lot of countries outside North America have multiple admin levels, and determining whether a given geoname is related to another one requires comparing their admin codes.
Instead of manually computing name variants and inserting them separately into the DB, use a custom Sqlite collating sequence. ("Variants" here means removing punctuation, lowercasing, removing diacritics, etc.)
Store each geoname's
ascii_name
as an alternate. That's useful for chars like "ö", which is represented as "oe" in the ASCII name (at least the geonames data I've seen).Minor changes:
Store latitude and longitude and strings instead of floats. I made this change to derive
Eq
forGeoname
, but it makes sense anyway and is how I should have done it originally.Add
Geoname::geoname_type
so consumers can easily understand whether it's a city, admin region, or country.Remove the
geoname_type
param fromfetch_geonames
. Consumers can filter out matching geonames that they don't want instead.Pull Request checklist
[ci full]
to the PR title.