Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dialect prototyping #925

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open

Dialect prototyping #925

wants to merge 24 commits into from

Conversation

elijah-potter
Copy link
Collaborator

@elijah-potter elijah-potter commented Mar 17, 2025

Issues

#345

Description

This is a large patch, so there's a lot to cover:

  • Created a Dialect type to represent the 4 major English dialects
  • Refactored WordMetadata, Token, and TokenKind to no longer be Copy.
    This affected performance by about -8%.
  • Reworked the MutableDictionary (and by extension the FstDictionary) to use word hashes as keys in a single map.
  • Added a derived_from element to the WordMetadata. This allows linters to determine the base word (the base word for bananas is banana) for an affixed word.
  • Renamed the hunspell module to rune. This module will slowly evolve to container a more bespoke dictionary file format. It has already evolved from the hunspell format sufficiently for a name change.
  • Assertions error messages have been improved.
  • Made it possible to construct a MutableDictionary at runtime using Rune files.
  • Imported words from MIT-licensed dictionaries for en-GB, en-CA and en-AU.
  • Added dialect selectors to the Obsidian and WordPress plugins
  • Added a search bar for Harper rules in the Obsidian plugin.

Next Steps Before Merge

Before I can feel comfortable merging this PR, there are a couple things that have to happen:

  • More non-American versions of words must be added to the dictionary and tagged as such. See the existing diff for the dictionary for an idea of how this works.
  • Rune needs to be extended to be able to describe the relationships between words of various dialects (color -> colour).
  • Additional unit and integration tests must be added for the newly-supported dialects.

Edit: the built-in spell check seems to convert between dialects quite well on its own. There isn't any immediate need for manual tagging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add search to the Obsidian plugin's settings Non-American Dialects of English
1 participant