Skip to content

chore: dictionary curation#3666

Open
carlosroe wants to merge 3 commits into
Automattic:masterfrom
carlosroe:dictionary-curation-2026-06-15
Open

chore: dictionary curation#3666
carlosroe wants to merge 3 commits into
Automattic:masterfrom
carlosroe:dictionary-curation-2026-06-15

Conversation

@carlosroe

Copy link
Copy Markdown

Issues

N/A

Description

This is my first time trying to commit words to Harper. Any help and criticism is highly appreciated. I am neither an expert in linguistics nor a native English speaker, just someone who would like to help by adding words I frequently use in my research!

+ACC
+CBASP
+Glu
+MRS
+rsFC
+glutamate
+pregenual
+racemate
+racemic

How Has This Been Tested?

cargo test

AI Disclosure

  • I am a human and didn't use any AI.
  • I used LLM features of my editor, but not an agent.
  • I used an AI agent interactively.
  • I am an agent or I got an agent to do the work autonomously.
  • I used LLM to find matching annotations for my words

Checklist

  • I have performed a self-review of my own code
  • I have added tests to cover my changes
  • I have considered splitting this into smaller pull requests.

+ACC
+CBASP
+Glu
+MRS
+rsFC
+glutamate
+pregenual
+racemate
+racemic

@hippietrail hippietrail left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes on the sort order of the dictionary being case sensitive and obscure/specialized words have a dedicated section.

Comment thread harper-core/dictionary.dict Outdated
GI/JNV
GIF/NSg # file format
GIGO/
Glu/N # symbol for glutamate (anion of glutamic acid)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is for all-uppercase. It sounds silly but when I tried to re-sort in a case insensitive ordering it revealed a latent bug where order of words in the dictionary makes a difference and made passing tests fail. A PR covering all aspects of case problems in the dictionary is being worked on.

Comment thread harper-core/dictionary.dict Outdated
Mozart/NOg
Mozilla/g # company
Mr/NSg
MRS/N # magnetic resonance spectroscopy

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Easiest way to get them in the right spot is to turn on case sensitive search on your editor. Around line 6267 is where you want to look.

Comment thread harper-core/dictionary.dict Outdated
PET/NOg # early computer
PFC/N
PG/JNO
pgACC/N # pregenual anterior cingulate cortex

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these look very specialized. The top/main part of the dictionary is for regular words. Specialized words are in a separate section so that they'll be easy to find if and when we decide to split them out into focussed Weir Packs. The specalized section starts around line 53,834 and 54,573 or so should be where a word starting with lowercase pg should go.

carlosroe added a commit to carlosroe/harper that referenced this pull request Jun 16, 2026
@carlosroe carlosroe force-pushed the dictionary-curation-2026-06-15 branch from 7a175c2 to eb7b8c1 Compare June 16, 2026 08:50

@hippietrail hippietrail left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just one more to move. Thanks.

Comment thread harper-core/dictionary.dict Outdated
@carlosroe carlosroe force-pushed the dictionary-curation-2026-06-15 branch from eb7b8c1 to d11a85a Compare June 20, 2026 16:10
@hippietrail

Copy link
Copy Markdown
Collaborator

You've got failing "snapshot" tests. Your new words probably change the top 3 spelling suggestions for other unknown words in the old texts we test against. The way to fix this is to run cargo test again. Usually only once but occasionally twice. Then commit the changed snapshots with the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants