fix(hooks): allow English typographic punctuation in pr-language-guard#585
Merged
Merged
Conversation
Em-dash, en-dash, curly quotes, and ellipsis are unambiguously English typography but the prior non-ASCII check rejected them along with CJK and other non-Latin scripts. This forced contributors and LLMs to rewrite natural English prose to ASCII hyphens and straight quotes, adding silent friction to PR/issue authoring (e.g., kcenon/claude-docker#254). Strip an allowlist of English typographic codepoints (U+2014, U+2013, U+201C/D, U+2018/9, U+2026, U+00A0) before the ASCII grep, and replace the generic 'non-ASCII' error with one that names the offending script category (Korean, CJK, Cyrillic, Greek, or 'non-Latin script' when no specific category matches). CJK and other non-Latin scripts continue to be rejected. Both the canonical validator (hooks/lib/validate-language.sh) and the inline fallback in global/hooks/pr-language-guard.sh are updated in lockstep to preserve byte-equivalent behavior. Closes #583
Add 10 regression cases under the english policy: - Em-dash, en-dash, curly double/single quotes, ellipsis allowed in both --body and --title positions. - Korean (Hangul), Japanese (Hiragana, Kanji), and Cyrillic remain blocked even when typographic punctuation is also present in the same input. These pin down the issue #583 contract so a future regression that re-broadens the rejection set or removes the allowlist will be caught immediately by the test suite. Relates to #583
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Allow common English typographic punctuation (em-dash, en-dash, curly quotes, ellipsis, NBSP) in PR/issue text validated by pr-language-guard. CJK and other non-Latin scripts remain blocked.
Why
The prior "any non-ASCII" check rejected unambiguously English typography along with non-Latin scripts, forcing rewrites of natural prose to ASCII hyphens and straight quotes. Recurring friction documented in kcenon/claude-docker#254. Closes #583.
How
Test Plan
Notes
Closes #583