Skip to content

[core][cjk] Implement CJK-aware help formatting#15

Merged
forfudan merged 5 commits intomainfrom
update
Mar 7, 2026
Merged

[core][cjk] Implement CJK-aware help formatting#15
forfudan merged 5 commits intomainfrom
update

Conversation

@forfudan
Copy link
Owner

@forfudan forfudan commented Mar 6, 2026

This PR adds CJK-aware terminal display-width calculations to ArgMojo’s help generator so option/positional/subcommand descriptions stay column-aligned when names contain CJK (2-column) characters.

Changes:

  • Introduce _display_width() / _is_wide_codepoint() utilities and use them for help-section padding instead of len().
  • Add help-formatting tests covering CJK and mixed ASCII/CJK alignment.
  • Update docs/changelog and add a CJK-heavy demo example (yu), wiring it into the build.

本拉取請求增加了對於漢字及其他寬度爲2的字符的顯示支持,並添加一實例(宇浩輸入法編碼查詢工具)來展示此新功能。

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CJK-aware terminal display-width calculations to ArgMojo’s help generator so option/positional/subcommand descriptions stay column-aligned when names contain CJK (2-column) characters.

Changes:

  • Introduce _display_width() / _is_wide_codepoint() utilities and use them for help-section padding instead of len().
  • Add help-formatting tests covering CJK and mixed ASCII/CJK alignment.
  • Update docs/changelog and add a CJK-heavy demo example (yu), wiring it into the build.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/argmojo/utils.mojo Adds Unicode display-width helpers used by help formatting.
src/argmojo/command.mojo Switches help padding computations to _display_width() for CJK alignment.
tests/test_help.mojo Adds regression tests asserting aligned help columns with CJK text.
docs/user_manual.md Documents CJK-aware help alignment and provides examples.
docs/changelog.md Notes the new CJK-aware alignment behavior in unreleased changes.
docs/argmojo_overall_planning.md Marks CJK-aware help formatting as done and updates planning notes.
examples/yu.mojo Adds a CJK-heavy demo app showcasing alignment in --help.
pixi.toml Builds the new yu example binary.
.gitignore Ignores the yu build artifact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

pixi.toml Outdated
"""

# clean build artifacts
clean = "rm -f argmojo.mojopkg mgrep mgit demo"
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build task now produces a yu binary, but the clean task still only removes mgrep, mgit, and demo. Consider updating clean to remove yu as well so pixi run clean stays accurate.

Suggested change
clean = "rm -f argmojo.mojopkg mgrep mgit demo"
clean = "rm -f argmojo.mojopkg mgrep mgit demo yu"

Copilot uses AI. Check for mistakes.
examples/yu.mojo Outdated
For full character tables, see https://shurufa.app

This demo app supports three Yuhao IME variants:
- 宇浩靈明 (--ling) — default
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example’s docstring lists a --ling option, but the CLI only defines --joy, --star, and --all (Lingming is implicit default). Either add an explicit --ling flag or adjust the docstring so the documented flags match the implemented arguments.

Suggested change
- 宇浩靈明 (--ling) — default
- 宇浩靈明 — default (used when no variant flag is given)

Copilot uses AI. Check for mistakes.
Comment on lines +612 to +616
- [x] Implement `_display_width(s: String) -> Int` in `utils.mojo`, traversing each code point:
- CJK Unified Ideographs, CJK Ext-A/B/C/D/E/F/G/H/I/J, fullwidth forms → width 2
- Other visible characters → width 1
- Zero-width joiners, combining marks → width 0
- [ ] Replace `len()` with `_display_width()` in all help formatting padding calculations (`_help_positionals_section`, `_help_options_section`, `_help_commands_section`)
- [ ] Add tests with mixed CJK/ASCII help text verifying column alignment
- [x] Replace `len()` with `_display_width()` in all help formatting padding calculations (`_help_positionals_section`, `_help_options_section`, `_help_commands_section`)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section claims _display_width treats “zero-width joiners, combining marks” as width 0, but the current implementation counts them as width 1. Either update _display_width to special-case these codepoints, or adjust this documentation bullet to reflect the actual behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +204 to +209
# Determine display width of this codepoint.
if _is_wide_codepoint(codepoint):
width += 2
else:
width += 1
return width
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_display_width currently treats every non-wide codepoint as width 1. This makes combining marks / variation selectors / zero-width joiners count toward column width, which contradicts the documented intent (e.g., planning doc notes they should be width 0) and can still misalign help for composed glyphs (accented characters, emoji sequences). Consider explicitly returning width 0 for known zero-width ranges (combining marks, VS15/VS16, ZWJ, etc.) or updating the documentation to match the implemented behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +166 to +176
var width = 0
var i = 0
var n = len(s)
while i < n:
var b0 = Int(s.as_bytes()[i])
# Skip ANSI escape sequences: ESC [ ... final_byte.
if b0 == 0x1B and i + 1 < n and Int(s.as_bytes()[i + 1]) == 0x5B:
i += 2 # skip ESC [
while i < n:
var c = Int(s.as_bytes()[i])
i += 1
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _display_width, s.as_bytes() is called repeatedly inside the loop (including in the inner ANSI-skip loop). Since width computation may run many times during help generation, it would be more efficient and clearer to store let bytes = s.as_bytes() once and index that buffer throughout.

Copilot uses AI. Check for mistakes.
Comment on lines +2990 to +2991
if _display_width(pos_plains[k]) > pos_max:
pos_max = _display_width(pos_plains[k])
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max-width loop computes _display_width(pos_plains[k]) twice per iteration (in the comparison and again in the assignment). Store the width in a local variable to avoid redundant UTF-8 scanning, especially as this scales with number of args.

Suggested change
if _display_width(pos_plains[k]) > pos_max:
pos_max = _display_width(pos_plains[k])
var width = _display_width(pos_plains[k])
if width > pos_max:
pos_max = width

Copilot uses AI. Check for mistakes.

ArgMojo's help formatter uses **display width** (East Asian Width) to compute padding, so help descriptions stay aligned even when option names, positional names, subcommand names, or help text contain CJK characters.

See [Unicode v17.0](https://www.unicode.org/charts/PDF/Unicode-17.0/) for details on CJK character ranges and properties.
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual links to “Unicode v17.0”, but the implementation comments reference Unicode 16.0 for the East Asian Width ranges. To avoid confusion (and potential stale/invalid links), consider aligning the referenced Unicode version across docs and code, or using a versionless link to the Unicode East Asian Width specification.

Suggested change
See [Unicode v17.0](https://www.unicode.org/charts/PDF/Unicode-17.0/) for details on CJK character ranges and properties.
See the [Unicode East Asian Width specification](https://www.unicode.org/reports/tr11/) for details on CJK character ranges and properties.

Copilot uses AI. Check for mistakes.
Comment on lines 3227 to 3236
var local_max: Int = 0
var global_max: Int = 0
for k in range(len(opt_plains)):
if opt_persistent[k]:
if len(opt_plains[k]) > global_max:
global_max = len(opt_plains[k])
if _display_width(opt_plains[k]) > global_max:
global_max = _display_width(opt_plains[k])
else:
if len(opt_plains[k]) > local_max:
local_max = len(opt_plains[k])
if _display_width(opt_plains[k]) > local_max:
local_max = _display_width(opt_plains[k])
var local_pad = local_max + 4
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local/global max-width loop calls _display_width(opt_plains[k]) multiple times per element. Consider computing it once per iteration (e.g., w = _display_width(...)) and using w for the comparison/assignment to reduce repeated scans.

Copilot uses AI. Check for mistakes.
Comment on lines 3326 to 3330
var cmd_max: Int = 0
for k in range(len(cmd_plains)):
if len(cmd_plains[k]) > cmd_max:
cmd_max = len(cmd_plains[k])
if _display_width(cmd_plains[k]) > cmd_max:
cmd_max = _display_width(cmd_plains[k])
var cmd_pad = cmd_max + 4
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commands-section padding loop recomputes _display_width(cmd_plains[k]) twice per iteration. Caching the computed width would avoid redundant work and keep the loop consistent with other two-pass formatting logic.

Copilot uses AI. Check for mistakes.
@forfudan forfudan merged commit 2d247e3 into main Mar 7, 2026
2 checks passed
@forfudan forfudan deleted the update branch March 7, 2026 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants