Skip to content

fix(hooks): allow English typographic punctuation in pr-language-guard#585

Merged
kcenon merged 2 commits into
developfrom
fix/issue-583-pr-language-guard-typography
May 3, 2026
Merged

fix(hooks): allow English typographic punctuation in pr-language-guard#585
kcenon merged 2 commits into
developfrom
fix/issue-583-pr-language-guard-typography

Conversation

@kcenon

@kcenon kcenon commented May 3, 2026

Copy link
Copy Markdown
Owner

What

Allow common English typographic punctuation (em-dash, en-dash, curly quotes, ellipsis, NBSP) in PR/issue text validated by pr-language-guard. CJK and other non-Latin scripts remain blocked.

Why

The prior "any non-ASCII" check rejected unambiguously English typography along with non-Latin scripts, forcing rewrites of natural prose to ASCII hyphens and straight quotes. Recurring friction documented in kcenon/claude-docker#254. Closes #583.

How

  • Strip an allowlist of English typographic codepoints (U+2014, U+2013, U+201C/D, U+2018/9, U+2026, U+00A0) before the ASCII grep.
  • Replace generic "non-ASCII" error with a script-category-aware message (Korean, CJK, Cyrillic, Greek, or "non-Latin script").
  • Apply the same change to both the canonical validator (hooks/lib/validate-language.sh) and the inline fallback (global/hooks/pr-language-guard.sh) to preserve byte-equivalent behavior.

Test Plan

  • bash tests/hooks/test-pr-language-guard-env.sh -- 31 passed, 0 failed (10 new typography cases added).
  • bash tests/hooks/test-language-validator.sh -- 48 passed, 0 failed (no regressions).
  • Manual: em-dash in this PR body itself was authored as ASCII hyphens because the OLD hook is still active for this PR creation; once merged the new hook permits typographic punctuation.

Notes

Closes #583

kcenon added 2 commits May 3, 2026 18:27
Em-dash, en-dash, curly quotes, and ellipsis are unambiguously English
typography but the prior non-ASCII check rejected them along with CJK
and other non-Latin scripts. This forced contributors and LLMs to
rewrite natural English prose to ASCII hyphens and straight quotes,
adding silent friction to PR/issue authoring (e.g., kcenon/claude-docker#254).

Strip an allowlist of English typographic codepoints (U+2014, U+2013,
U+201C/D, U+2018/9, U+2026, U+00A0) before the ASCII grep, and replace
the generic 'non-ASCII' error with one that names the offending script
category (Korean, CJK, Cyrillic, Greek, or 'non-Latin script' when no
specific category matches). CJK and other non-Latin scripts continue
to be rejected.

Both the canonical validator (hooks/lib/validate-language.sh) and the
inline fallback in global/hooks/pr-language-guard.sh are updated in
lockstep to preserve byte-equivalent behavior.

Closes #583
Add 10 regression cases under the english policy:
- Em-dash, en-dash, curly double/single quotes, ellipsis allowed
  in both --body and --title positions.
- Korean (Hangul), Japanese (Hiragana, Kanji), and Cyrillic remain
  blocked even when typographic punctuation is also present in the
  same input.

These pin down the issue #583 contract so a future regression that
re-broadens the rejection set or removes the allowlist will be
caught immediately by the test suite.

Relates to #583
@kcenon kcenon merged commit cea99cc into develop May 3, 2026
3 checks passed
@kcenon kcenon deleted the fix/issue-583-pr-language-guard-typography branch May 3, 2026 09:29
@kcenon kcenon mentioned this pull request Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant