fix(core): preserve camelCase proper nouns in title case#3438
fix(core): preserve camelCase proper nouns in title case#3438johndecker3 wants to merge 2 commits into
Conversation
|
I tried to rerun the checks because it looked like a timeout issue rather than a code issue, but I cannot due to admin limitations. Please let me know if there is anything I need to do to address the failed checks. |
|
There is But note that it's fuzzy at the moment because the dictionary is case-folded. It can tell you that at least one spelling that went into the case-folded entry was mixed case or started with a lowercase letter. This is independent from the I fixed one place this happened about six months ago but don't remember where. Knowing the names of the structs should help you find it. For the underlying case-folding problem itself, there is #2630 which will address it and related issues around letter case, acronyms & initialisms, etc. |
|
Thanks — that's a much cleaner path than my string-pattern heuristic. I'll switch the check to metadata.is_lower_camel() (equivalent to metadata.orth_info.contains(OrthFlags::LOWER_CAMEL)). Confirming the equivalence: iCloud / iPad / iPhone / iPod / iMac / iTunes / iOS / macOS / eBay → LOWER_CAMEL set, first-letter rule skipped (fix applies). One question on the "place I fixed about six months ago" — was that the OrthographicConsistency rule added in PR #2107? That one uses OrthFlags::LOWER_CAMEL at orthographic_consistency.rs:89 for canonical-spelling suggestions. If yes, I'll model my fix on the same pattern; if you were thinking of somewhere else, a pointer would be helpful. I read through #2630 — that refactor will sharpen these queries (per-spelling vs case-folded) and adds more convenience around OrthFlags, but my fix shouldn't conflict with it: switching to is_lower_camel() puts me on the same API surface you're expanding there. Happy to revisit once #2630 lands if you want, but I'd suggest this PR can stay independent. |
I believe Elijah made the It sounds like you're equipped with enough for the job either way (-: |
|
Thanks! Used metadata.is_lower_camel(). Worked great as expected. |
Issues
Related: #831 (closed) — the symptom was reported then, and #834 routed
ProperNounCapitalizationLinteraway frommake_title_caseto dodge it. The underlying bug inmake_title_caseremains and still affects theUseTitleCaseheading linter; this PR fixes it at the source.Description
The title-case linter (
UseTitleCase) renders camelCase brand names likeiCloud,iOS,macOS, andeBayasICloud,IOS,MacOS, andEBayin headings — overwriting the intentional lowercase first letter with an uppercase one. Reproduction:Root cause
try_make_title_case(harper-core/src/title_case.rs) runs two passes per word:The two passes don't coordinate. For a canonical like
iCloud, pass 1 writesi,C,l,o,u,dand pass 2 immediately overwrites position 0 withI, producingICloud. Words whose canonical form already starts with uppercase (e.g.,JavaScript,MacBook,VideoPress,NASA) escape the bug because pass 2's first-letter uppercase is a no-op on them.Relationship to PR #834
PR #834 ("refactor(core): proper noun linters use canonical casing and JSON file") closed #831 by changing
ProperNounCapitalizationLinterto read canonical capitalization directly fromproper_noun_rules.jsoninstead of callingmake_title_case. That routed one caller around the buggy function but leftmake_title_caseitself broken — andUseTitleCasestill calls it. This PR fixes the function so all callers benefit, present and future.Fix
A targeted heuristic in pass 1 detects when the canonical form is intentionally camelCase — the canonical's first alphabetic character is lowercase AND at least one other character is uppercase. When that condition holds, pass 2's first-letter rule is skipped (pass 1 has already written exactly what's wanted).
Why the heuristic isn't just "skip if proper noun"
The dictionary has entries like
Apple/ONg(the company) andapple/~NwgS(the fruit). Looking upapplereturns canonical"apple"(lowercase) due to the dual-entry ambiguity. If the proper-noun pass unconditionally won,applewould title-case toappleinstead ofApple. The heuristic requires both a lowercase first AND an uppercase elsewhere, which keepsapple→Appleworking while fixingicloud→iCloud.Demo
How Has This Been Tested?
Three new tests added:
title_case::tests::preserves_icloud_camel_case_mid_sentence— direct unit test for the title-case function with iCloud mid-sentence ("she backs up photos to icloud"→"She Backs up Photos to iCloud").title_case::tests::preserves_icloud_camel_case_as_first_word— direct unit test for the "even the first word should keep its lowercase letter" behavior ("icloud syncs your files"→"iCloud Syncs Your Files").linting::use_title_case::tests::preserves_camel_case_proper_nouns_in_heading— full-linter regression test for the original symptom ("### apple launched icloud"→"### Apple Launched iCloud").cargo test -p harper-core --lib title_caseandcargo test -p harper-core --lib use_title_caseboth green (49/49 and 7/7 respectively). The existingfixes_video_presstest confirms that proper nouns with uppercase-first canonical forms (VideoPress) continue to work. The existing tests for special articles, conjunctions, and short prepositions also still pass.I used
iCloudrather thaniPhonein tests becauseiPhone,iPad,iPod,iMac, andiTunesdon't yet have the proper-noun flag on master — that's coming in a separate PR (#[fill in PR 1 number]). Once that PR merges, this fix automatically benefits those words too with no further test changes needed.AI Disclosure
If Your PR Implements or Enhances a Linter
Checklist