Skip to content

fix: variants support for "tricky" cases#2390

Merged
BoboTiG merged 2 commits intomasterfrom
fix-variants
Mar 22, 2025
Merged

fix: variants support for "tricky" cases#2390
BoboTiG merged 2 commits intomasterfrom
fix-variants

Conversation

@BoboTiG
Copy link
Copy Markdown
Member

@BoboTiG BoboTiG commented Mar 22, 2025

  • Add regression tests to ensure current behavior is respected, and prevent future breakings.
  • [Kobo] Variants from different prefix groups were ignored (like in FR with "suis" which is a variant of "être"). As the device will ignore variants from different prefix group, I deleted the code to simplify the logic. It seems harlmess.
  • Variants are passed as a list of str to templates, allowing to merge the code for Kobo & DictFile formats.
  • Variants special cases are now also made available to DictFile, and its sub-formats.
  • The code to handle variants is greatly simplified now.
  • Removed the no more used Word.empty().

Related to #2379.

- Add regression tests to ensure current behavior is respected, and prevent future breakings.
- [Kobo] Variants from different prefix groups were ignored (like in FR with "suis" which is a variant of "être").
  As the device will ignore variants from different prefix group, I deleted the code to simplify the logic.
  It seems harlmess.
- Fix variants pointing to an empty word itself pointing to another variant (only 1 level supported).
- Variants are passed as a list of `str` to templates, allowing to merge the code for Kobo & DictFile formats.
- Variants special cases are now also made available to DictFile, and its sub-formats.
- The code to handle variants is greatly simplified now.
- Removed the no more used `Word.empty()`.
Comment thread wikidict/convert.py
for variant in details.variants:
if root_details := self.words.get(variant):
variants_words[variant] = root_details
if word.endswith("s"): # crude detection of plural
Copy link
Copy Markdown
Member Author

@BoboTiG BoboTiG Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since our variants "detectors" are way better now, this hack is no more useful. Given the complexity of the variants code, I prefer to delete that piece.

@BoboTiG
Copy link
Copy Markdown
Member Author

BoboTiG commented Mar 22, 2025

[Kobo] Variants from different prefix groups were ignored (like in FR with "suis" which is a variant of "être"). As the device will ignore variants from different prefix group, I deleted the code to simplify the logic. It seems harlmess.

Now that I think more about this, it might be better to do pollute dicts with useless data (they already are quite big).
Moreover, we talk about "variants", but it's more about "misses" (as descripbed in https://github.com/hunspell/hunspell/blob/ecc6dbb52025bdf3a766429988e64190d912765f/man/hunspell.1#L93-L139). Like typos. And well, a typo where first letters are different is no more a typo 😅

- simplify `handle_word()`
- unrelated to this PR: fix off-by-one check for unicode ranges in .mobi
@BoboTiG
Copy link
Copy Markdown
Member Author

BoboTiG commented Mar 22, 2025

At the end, I mostly added more comments to your code @lasconic, thanks for the primary work :)

@BoboTiG BoboTiG merged commit 719bc23 into master Mar 22, 2025
@BoboTiG BoboTiG deleted the fix-variants branch March 22, 2025 19:08
BoboTiG added a commit that referenced this pull request Mar 23, 2025
+ Expand tests.

Example in FR with "pu", "pouvoir", and "paître" (3 different prefixes).

Follow up of #2390.
Related to #2379.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant