Skip to content

Conversation

@Enyium
Copy link
Contributor

@Enyium Enyium commented Jan 1, 2026

First I also added this (but removed it again for this PR):

    # Favor more common words.
    "ad": "add",
    "plane": "plain",
    "too": "to",
    "two": "to",
    # Favor American spellings, because they're standard in APIs, and the models often (mostly?) produce it.
    "dialogue": "dialog",

But this also changes the words when using "say...", which is unwanted. With "say...", you normally say more than one word, and it must generally be assumed to be likely that the context words give enough information for the correct spelling. With "word...", however, which is contextless, it would be useful to produce the more common spellings by default. If the user needs another spelling, "phones..." must be used. Are y'all generally open to implement a differentiation between contextless and contextful word replacements? Then I would open an issue for this.


Regarding American spellings: From my experience, the speech recognition models are already likely to produce American spellings. The list should contain corrections of other spellings to avoid inconsistency. To cater to users that need other spelling systems, an additional replacement layer after the current one would be needed. The lists for this would probably need to be very large (all verbs ending in "ize" etc. pp.) and may be out of the scope of this repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant