Skip to content

Bug 1958992 - suggest: Improve geonames l10n and weather-suggestions matching. #6745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

0c0w3
Copy link
Contributor

@0c0w3 0c0w3 commented May 2, 2025

This is a substantial reworking of geonames and weather suggestions in suggest, including some breaking API changes. I didn't bother deprecating anything because AFAIK desktop is the only consumer that uses these, and we can just fix it when we vendor.

Summary of major changes:

In RS, don't store geonames' alternate names inline with the core geonames data. Instead, use separate record types. (As a reminder, "alternates" just means variants of a geoname's main name, like "NYC" and "NY" are alternates for New York City.) So now there are two record types: core geonames data and alternates. The core records contain the main geonames data: IDs, canonical name, country, admin divisions, etc., and they can be ingested by all clients regardless of their locale or country. The alternates records are scoped by language and are intended to be ingested only by clients with matching locales.

Improve geonames fetching and weather-suggestion matching so all admin levels and countries are supported. e.g., "waterloo on", "waterloo canada", "waterloo on canada", etc.

Relax the weather parsing a little to allow multiple weather keywords ("rain weather").

Keep track of all available admin codes per geoname. There are four of them. This is necessary because a lot of countries outside North America have multiple admin levels, and determining whether a given geoname is related to another one requires comparing their admin codes.

Instead of manually computing name variants and inserting them separately into the DB, use a custom Sqlite collating sequence. ("Variants" here means removing punctuation, lowercasing, removing diacritics, etc.)

Store each geoname's ascii_name as an alternate. That's useful for chars like "ö", which is represented as "oe" in the ASCII name (at least the geonames data I've seen).

Minor changes:

Store latitude and longitude and strings instead of floats. I made this change to derive Eq for Geoname, but it makes sense anyway and is how I should have done it originally.

Add Geoname::geoname_type so consumers can easily understand whether it's a city, admin region, or country.

Remove the geoname_type param from fetch_geonames. Consumers can filter out matching geonames that they don't want instead.

Pull Request checklist

  • Breaking changes: This PR follows our breaking change policy
    • This PR follows the breaking change policy:
      • This PR has no breaking API changes, or
      • There are corresponding PRs for our consumer applications that resolve the breaking changes and have been approved
  • Quality: This PR builds and tests run cleanly
    • Note:
      • For changes that need extra cross-platform testing, consider adding [ci full] to the PR title.
      • If this pull request includes a breaking change, consider cutting a new release after merging.
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Changelog: This PR includes a changelog entry in CHANGELOG.md or an explanation of why it does not need one
    • Any breaking changes to Swift or Kotlin binding APIs are noted explicitly
  • Dependencies: This PR follows our dependency management guidelines
    • Any new dependencies are accompanied by a summary of the due diligence applied in selecting them.

@0c0w3 0c0w3 requested review from ncloudioj and tiftran May 2, 2025 07:21
@ncloudioj
Copy link
Member

@0c0w3 - Hey Drew, just finished up some other stuff, will look into this next week. Sorry for the delay.

Copy link
Member

@ncloudioj ncloudioj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! The l10n handling is very cool.

Copy link
Contributor

@tiftran tiftran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 🙂

0c0w3 added 2 commits May 17, 2025 15:29
…matching.

This is a substantial reworking of geonames and weather suggestions in suggest.

Summary of major changes:

In RS, don't store geonames' alternate names inline with the core geonames data.
Instead, use separate record types. (As a reminder, "alternates" just means
variants of a geoname's main name, like "NYC" and "NY" are alternates for New
York City.) So now there are two record types: core geonames data and
alternates. The core records contain the main geonames data: IDs, canonical
name, country, admin divisions, etc., and they can be ingested by all clients
regardless of their locale or country. The alternates records are scoped by
language and are intended to be ingested only by clients with matching locales.

Improve geonames fetching and weather-suggestion matching so all admin levels
and countries are supported. e.g., "waterloo on", "waterloo canada", "waterloo
on canada", etc.

Relax the weather parsing a little to allow multiple weather keywords ("rain
weather").

Keep track of all available admin codes per geoname. There are four of them.
This is necessary because a lot of countries outside North America have multiple
admin levels, and determining whether a given geoname is related to another one
requires comparing their admin codes.

Instead of manually computing name variants and inserting them separately into
the DB, use a custom Sqlite collating sequence. ("Variants" here means removing
punctuation, lowercasing, removing diacritics, etc.)

Store each geoname's `ascii_name` as an alternate. That's useful for chars like
"ö", which is represented as "oe" in the ASCII name (at least the geonames data
I've seen).

Minor changes:

Store latitude and longitude and strings instead of floats. I made this change
to derive `Eq` for `Geoname`, but it makes sense anyway and is how I should have
done it originally.

Add `Geoname::geoname_type` so consumers can easily understand whether it's a
city, admin region, or country.

Remove the `geoname_type` param from `fetch_geonames`. Consumers can filter out
matching geonames that they don't want instead.
@0c0w3 0c0w3 force-pushed the adw/geonames-l10n-bug-1958992 branch from a9358ac to 23d1ebd Compare May 18, 2025 02:40
@0c0w3
Copy link
Contributor Author

0c0w3 commented May 18, 2025

Thanks! I'll wait to merge this until I can get a desktop patch together. I think I'll also need another PR where fetch_geonames returns localized names to consumers, for Yelp suggestions.

@0c0w3 0c0w3 force-pushed the adw/geonames-l10n-bug-1958992 branch from 23d1ebd to b6d072d Compare May 18, 2025 02:50
@0c0w3
Copy link
Contributor Author

0c0w3 commented May 20, 2025

The latest commit reverts the change from Geoname::country to country_code, same for the admin codes. I'm working on another PR that adds a function for consumers to fetch localized/alternate names, and having "code" in these property names helps disambiguate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants