Spell checks source code:
- Requires special word-splitting logic to handle situations like hex (
0xDEADBEEF),c\nescapes,snake_case,CamelCase,SCREAMING_CASE, and maybearrow-case. - Each programming language has its own quirks, like abbreviations, lack of word separator (
copysign), etc - Backwards compatibility might require keeping misspelled words.
- Case for proper nouns is irrelevant.
Checking for errors in a CI:
- No false-positives.
- On spelling errors, sets the exit code to fail the CI.
- Machine-independent, repo-specific configuration
- As compared to layered config with the users system or the command-line
Quick feedback and resolution for developer:
- Fix errors for the user.
- Integration into other programs, like editors:
fork: easy to call into and provides a stable API, including output format- linking: either in the language of choice or bindings can be made to language of choice.
Corrections: Known misspellings that map to their corresponding dictionary word
- Ignores unknown typos
- Ignores typos that follow c-escapes if they aren't handled correctly
- Good for unassisted automated correcting
- Fast, can quickly run across large code bases
Dictionary: A confidence rating is given for how close a word is to one in a dictionary
- Sensitive to false positives due to hex numbers and c-escapes
- Used in word processors and other traditional spell checking applications
- Good when there is a UI to let the user know and override any decisions
With a focus on spell checking source code, most text will be in the form of
identifiers that are made up of words conjoined via snake_case, CamelCase,
etc. A typo at the word level might not be a typo as part of
an identifier, so identifiers get checked and, if not in a dictionary, will
then be split into words to be checked.
Identifiers are defined using
unicode's XID_Continue
which includes [a-zA-Z0-9_].
Identifiers are case-sensitive.
Words are split from identifiers on case changes as well as breaks in
[a-zA-Z] with a special case to handle acronyms.
Words are case-insensitive.
Examples:
| Identifier | Words |
|---|---|
snake_case |
snake, case |
CamelCase |
Camel, Case |
First10HTMLTokens |
First, HTML, Tokens |
To see this in action,
- run
typos --identifiersortypos --words. - run
typos --highlight-identifiersortypos --highlight-words.