Skip to content

feat: allow blank lines and comments in dictionary.dict #756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 13, 2025

Conversation

hippietrail
Copy link
Collaborator

@hippietrail hippietrail commented Feb 23, 2025

Description

This version would support comments after any whitespace following a dictionary entry with its affix annotation on the same line.

If we do need to support words with spaces then I'll redesign this to require a comment delimiter.

# is the comment delimiter. It is not currently used as an affix annotation flag and perhaps shouldn't be. The current logic will break if becomes one.

How Has This Been Tested?

Fails on lints_lots_of_latin_correctly due to the dictionary containing et al. around line 49,839 as the sole dictionary entry containing a space.

Is that actually intentional and supported?

I've added new tests to cover:

  • blank lines
  • full-line comments
  • line-end comments after entries with and without affix annotations

Checklist

  • I have performed a self-review of my own code
  • I have added tests to cover my changes

Fails on `lints_lots_of_latin_correctly` due to the dictionary containing `et al.` around line 49,839 as the sole dictionary entry containing a space.

Is that actually intentional and supported?

This version would support comments after any whitespace following a dictionary entry with its affix annotation on the same line.

If we do need to support words with spaces then I'll redesign this to require a comment delimiter.
@ficcdaf
Copy link
Contributor

ficcdaf commented Feb 24, 2025

Personally, I think a comment delimiter would be a good idea regardless. Results in much less ambiguity. For example, if someone opens the dictionary and sees #, they'll immediately know it's a comment. And this does leave support for words with spaces in them -- while rare, for example in case of et al. it makes sense.

@hippietrail
Copy link
Collaborator Author

Personally, I think a comment delimiter would be a good idea regardless. Results in much less ambiguity. For example, if someone opens the dictionary and sees #, they'll immediately know it's a comment. And this does leave support for words with spaces in them -- while rare, for example in case of et al. it makes sense.

Yes I'm seeking clarification on the issue of terms with spaces. In tests I found I can add them arbitrarily but I can't get them suggested unless I use the term with the spaces removed, and optional some other characters removed, but not with the space in the wrong place, etc.

There are tons of terms to add if this is going to be a thing.

Only problem with the delimiter is it can also be ambiguous in that it can look like a an annotation flag. # isn't currently used but we're starting to run out and a lot is missing.

I've also started a syntax highlighter so that would also make the comments stand out. The idea is there would be a bunch of whitespace, not just one, at least locally aligned.

Which is not to say I'm closed to the idea of a delimiter (-:

@elijah-potter
Copy link
Collaborator

Use # as a comment delimiter. I don't think whitespace after the / should hold meaning in the context you've laid out here.

To clarify the "latin" issue. We previously had issues (solved in #473) with words like et al. which I saw as more "words" than as a higher level phrase. In order for other "words" to be properly recognized like this, they have to be added to the dictionary and an exception has to be inserted to the Document parse stage.

Personally, I see this as out-of-scope for this PR, but we can still discuss it here.

@hippietrail
Copy link
Collaborator Author

hippietrail commented Mar 4, 2025

Use # as a comment delimiter. I don't think whitespace after the / should hold meaning in the context you've laid out here.

I'll implement # today. As a bonus we'll be able to have full-line comments to divide sections. Mostly "normal words" vs "very domain-specific coding words"

To clarify the "latin" issue. We previously had issues (solved in #473) with words like et al. which I saw as more "words" than as a higher level phrase. In order for other "words" to be properly recognized like this, they have to be added to the dictionary and an exception has to be inserted to the Document parse stage.

In lexicography to avoid ambiguity with the word "word" the terms "lexeme" and "listeme" are used to cover normal words and terms like these.

Personally, I see this as out-of-scope for this PR, but we can still discuss it here.

Me too. But I'll now start gathering Latin and other terms like this to put in an issue somewhere.

I added the first term to a new list in issue 823 here

@elijah-potter
Copy link
Collaborator

I made once_cell a dev-dependency since we only use it in tests. Otherwise, this looks good to me!

@elijah-potter elijah-potter enabled auto-merge March 13, 2025 19:19
@elijah-potter elijah-potter added this pull request to the merge queue Mar 13, 2025
@elijah-potter elijah-potter removed this pull request from the merge queue due to a manual request Mar 13, 2025
@elijah-potter elijah-potter added this pull request to the merge queue Mar 13, 2025
Merged via the queue into Automattic:master with commit 6bc4f41 Mar 13, 2025
22 checks passed
@hippietrail hippietrail deleted the commented-dict branch March 14, 2025 03:57
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Mar 24, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [Automattic/harper/harper-ls](https://github.com/Automattic/harper) | minor | `v0.24.0` -> `v0.26.0` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>Automattic/harper (Automattic/harper/harper-ls)</summary>

### [`v0.26.0`](https://github.com/Automattic/harper/releases/tag/v0.26.0)

[Compare Source](Automattic/harper@v0.25.1...v0.26.0)

#### What's Changed

-   docs: fix user dictionary by [@&#8203;kit494way](https://github.com/kit494way) in Automattic/harper#893
-   feat: mask out comments beginning with spellchecker:ignore by [@&#8203;grantlemons](https://github.com/grantlemons) in Automattic/harper#861
-   feat(harper.js): export both binary and inlinedBinary for different runtimes by [@&#8203;Asuka109](https://github.com/Asuka109) in Automattic/harper#607
-   feat: linter for "as far back as" to replace "as early back as" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#889
-   feat: flag "explanation mark/point" instead of "exclamation" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#895
-   feat: correct "in anyway" to "in any way" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#894
-   build(deps): bump [@&#8203;babel/helpers](https://github.com/babel/helpers) from 7.26.9 to 7.26.10 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#899
-   fix: two spelling mistakes based on homophones by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#886
-   feat: allow blank lines and comments in `dictionary.dict` by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#756
-   docs: fix typo [#&#8203;906](Automattic/harper#906) by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#912
-   hotfix(core): properly store spans in `PatternLinter` cache by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#926
-   Dictionary curation 2025 03 12 by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#902
-   Dialect prototyping by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#925
-   feat: insert newline automatically in `just addnoun` by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#931
-   docs: fix 3 grammar mistakes by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#933
-   feat: linter for "each and everyone" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#923
-   feat: expand the "get rid off" lint to cover "get ride of" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#900
-   fix(vscode-plugin): ignore non-existent ".git" files, support untitled/unsaved files on VS Code by [@&#8203;kiding](https://github.com/kiding) in Automattic/harper#927
-   feat(core): improve assertion to allow overlapping suggestions by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#935
-   build(deps): bump [@&#8203;wordpress/editor](https://github.com/wordpress/editor) from 14.19.0 to 14.20.0 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#915
-   build(deps): bump indexmap from 2.7.1 to 2.8.0 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#921
-   build(deps): bump tokio from 1.43.0 to 1.44.1 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#919
-   build(deps-dev): bump [@&#8203;types/node](https://github.com/types/node) from 22.13.9 to 22.13.10 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#913
-   build(deps): bump foldhash from 0.1.4 to 0.1.5 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#917
-   feat: correct "along time" to "a long time" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#910
-   Add -able affix to open (openable) by [@&#8203;claydugo](https://github.com/claydugo) in Automattic/harper#930
-   docs: mention hidden library dependencies by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#943
-   feat(core): create new test assertion for `nth` suggestion results by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#942
-   build: migrate to pnpm workspace & biome by [@&#8203;Asuka109](https://github.com/Asuka109) in Automattic/harper#924
-   build(deps): bump serde from 1.0.218 to 1.0.219 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#920
-   build(deps): bump clap from 4.5.31 to 4.5.32 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#946
-   Web improvements by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#944
-   feat: ignore shebang lines by [@&#8203;holmanb](https://github.com/holmanb) in Automattic/harper#947
-   feat(web): add mask-image to header by [@&#8203;Asuka109](https://github.com/Asuka109) in Automattic/harper#951
-   fix(core): reduce ambiguity for `AvoidContraction` by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#941
-   chore: add comments describing major sections by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#952

#### New Contributors

-   [@&#8203;kit494way](https://github.com/kit494way) made their first contribution in Automattic/harper#893
-   [@&#8203;holmanb](https://github.com/holmanb) made their first contribution in Automattic/harper#947

**Full Changelog**: Automattic/harper@v0.25.1...v0.26.0

### [`v0.25.1`](https://github.com/Automattic/harper/releases/tag/v0.25.1)

[Compare Source](Automattic/harper@v0.25.0...v0.25.1)

#### What's Changed

-   docs(ls): give example config that disables `sentence_capitalization` by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#879
-   fix(core): indexing problem in Regexish work by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#883
-   Just getforms improvements by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#862
-   Dictionary curation 2025 03 11 by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#884
-   fix(core): insert paragraph breaks after code blocks by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#882

**Full Changelog**: Automattic/harper@v0.25.0...v0.25.1

### [`v0.25.0`](https://github.com/Automattic/harper/releases/tag/v0.25.0)

[Compare Source](Automattic/harper@v0.24.0...v0.25.0)

#### What's Changed

-   docs: update integrations section by [@&#8203;mcecode](https://github.com/mcecode) in Automattic/harper#755
-   Typst Corrections by [@&#8203;grantlemons](https://github.com/grantlemons) in Automattic/harper#442
-   refactor: add comments to `just addnoun` and tweak logic by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#605
-   feat: implements [#&#8203;841](Automattic/harper#841) by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#842
-   Add WordPress Plugin Documentation and Demo by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#838
-   feat: add `just newest-dict-changes` by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#701
-   Spellcheck improvements by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#844
-   fix: add missing "gotten rid off" to other "rid off" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#840
-   Rules page improvements by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#843
-   build(deps): bump axios from 1.8.1 to 1.8.2 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#845
-   Regexish by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#669
-   fix: fall back to `grep` when `rg` is not available by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#848
-   feat: flag "monumentous" and offer "momentous" and "monumental" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#864
-   build(deps-dev): bump svelte-check from 4.1.4 to 4.1.5 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#874
-   build(deps): bump typst-syntax from 0.13.0 to 0.13.1 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#867
-   build(deps-dev): bump typescript from 5.7.3 to 5.8.2 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#871
-   build(deps-dev): bump autoprefixer from 10.4.20 to 10.4.21 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#873
-   Dictionary curation 2025 03 08 by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#860
-   feat: add many variants of "change of tact"->"tack" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#852
-   feat: implement [#&#8203;525](Automattic/harper#525) (worse/worst confusion) by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#856
-   build(deps): bump cached from 0.54.0 to 0.55.1 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#868
-   build(deps): bump anyhow from 1.0.96 to 1.0.97 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#865
-   Build against an older GLIBC version by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#877
-   Cache busting by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#876
-   build(deps): bump thiserror from 2.0.11 to 2.0.12 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#866
-   build(deps): bump serde_json from 1.0.139 to 1.0.140 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#869
-   feat: add a lint to correct "in of itself" to "in and of itself" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#863
-   feat: implement "ticking time clock" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#851
-   feat: implements [#&#8203;746](Automattic/harper#746) by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#855
-   feat(dict): added words to dictionary by [@&#8203;ficcdaf](https://github.com/ficcdaf) in Automattic/harper#847
-   fix: Ignore hex codes inside rgb function calls by [@&#8203;grantlemons](https://github.com/grantlemons) in Automattic/harper#857
-   feat: Added Linux musl compilations by [@&#8203;kiding](https://github.com/kiding) in Automattic/harper#878

#### New Contributors

-   [@&#8203;kiding](https://github.com/kiding) made their first contribution in Automattic/harper#878

**Full Changelog**: Automattic/harper@v0.24.0...v0.25.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xOTIuMCIsInVwZGF0ZWRJblZlciI6IjM5LjIxMC4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants