Skip to content

Commit a9358ac

Browse files
committed
Bug 1958992 - suggest: Improve geonames l10n and weather-suggestions matching.
This is a substantial reworking of geonames and weather suggestions in suggest. Summary of major changes: In RS, don't store geonames' alternate names inline with the core geonames data. Instead, use separate record types. (As a reminder, "alternates" just means variants of a geoname's main name, like "NYC" and "NY" are alternates for New York City.) So now there are two record types: core geonames data and alternates. The core records contain the main geonames data: IDs, canonical name, country, admin divisions, etc., and they can be ingested by all clients regardless of their locale or country. The alternates records are scoped by language and are intended to be ingested only by clients with matching locales. Improve geonames fetching and weather-suggestion matching so all admin levels and countries are supported. e.g., "waterloo on", "waterloo canada", "waterloo on canada", etc. Relax the weather parsing a little to allow multiple weather keywords ("rain weather"). Keep track of all available admin codes per geoname. There are four of them. This is necessary because a lot of countries outside North America have multiple admin levels, and determining whether a given geoname is related to another one requires comparing their admin codes. Instead of manually computing name variants and inserting them separately into the DB, use a custom Sqlite collating sequence. ("Variants" here means removing punctuation, lowercasing, removing diacritics, etc.) Store each geoname's `ascii_name` as an alternate. That's useful for chars like "ö", which is represented as "oe" in the ASCII name (at least the geonames data I've seen). Minor changes: Store latitude and longitude and strings instead of floats. I made this change to derive `Eq` for `Geoname`, but it makes sense anyway and is how I should have done it originally. Add `Geoname::geoname_type` so consumers can easily understand whether it's a city, admin region, or country. Remove the `geoname_type` param from `fetch_geonames`. Consumers can filter out matching geonames that they don't want instead.
1 parent 6375ad5 commit a9358ac

File tree

12 files changed

+2369
-1037
lines changed

12 files changed

+2369
-1037
lines changed

Cargo.lock

Lines changed: 21 additions & 12 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

components/suggest/Cargo.toml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ log = "0.4"
1717
once_cell = "1.5"
1818
parking_lot = ">=0.11,<=0.12"
1919
remote_settings = { path = "../remote_settings" }
20-
rusqlite = { version = "0.33.0", features = ["functions", "bundled", "load_extension"] }
20+
rusqlite = { version = "0.33.0", features = ["functions", "bundled", "load_extension", "collation"] }
2121
serde = { version = "1", features = ["derive"] }
2222
serde_json = "1"
2323
error-support = { path = "../support/error" }
@@ -26,6 +26,9 @@ viaduct = { path = "../viaduct" }
2626
viaduct-reqwest = { path = "../support/viaduct-reqwest", optional=true }
2727
tempfile = { version = "3.2.0", optional = true }
2828
thiserror = "1"
29+
# This is an old version of `unicase` but it's the one mozilla-central uses.
30+
unicase = "2.6"
31+
unicode-normalization = "0.1"
2932
uniffi = { version = "0.29.0" }
3033
url = { version = "2.1", features = ["serde"] }
3134

@@ -34,6 +37,7 @@ criterion = "0.5"
3437
env_logger = { version = "0.10", default-features = false }
3538
expect-test = "1.4"
3639
hex = "0.4"
40+
itertools = "0.14"
3741
rc_crypto = { path = "../support/rc_crypto" }
3842

3943
[build-dependencies]

0 commit comments

Comments
 (0)