Conversation
There was a problem hiding this comment.
Pull request overview
Adds CJK-aware terminal display-width calculations to ArgMojo’s help generator so option/positional/subcommand descriptions stay column-aligned when names contain CJK (2-column) characters.
Changes:
- Introduce
_display_width()/_is_wide_codepoint()utilities and use them for help-section padding instead oflen(). - Add help-formatting tests covering CJK and mixed ASCII/CJK alignment.
- Update docs/changelog and add a CJK-heavy demo example (
yu), wiring it into the build.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
src/argmojo/utils.mojo |
Adds Unicode display-width helpers used by help formatting. |
src/argmojo/command.mojo |
Switches help padding computations to _display_width() for CJK alignment. |
tests/test_help.mojo |
Adds regression tests asserting aligned help columns with CJK text. |
docs/user_manual.md |
Documents CJK-aware help alignment and provides examples. |
docs/changelog.md |
Notes the new CJK-aware alignment behavior in unreleased changes. |
docs/argmojo_overall_planning.md |
Marks CJK-aware help formatting as done and updates planning notes. |
examples/yu.mojo |
Adds a CJK-heavy demo app showcasing alignment in --help. |
pixi.toml |
Builds the new yu example binary. |
.gitignore |
Ignores the yu build artifact. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
pixi.toml
Outdated
| """ | ||
|
|
||
| # clean build artifacts | ||
| clean = "rm -f argmojo.mojopkg mgrep mgit demo" |
There was a problem hiding this comment.
The build task now produces a yu binary, but the clean task still only removes mgrep, mgit, and demo. Consider updating clean to remove yu as well so pixi run clean stays accurate.
| clean = "rm -f argmojo.mojopkg mgrep mgit demo" | |
| clean = "rm -f argmojo.mojopkg mgrep mgit demo yu" |
examples/yu.mojo
Outdated
| For full character tables, see https://shurufa.app | ||
|
|
||
| This demo app supports three Yuhao IME variants: | ||
| - 宇浩靈明 (--ling) — default |
There was a problem hiding this comment.
The example’s docstring lists a --ling option, but the CLI only defines --joy, --star, and --all (Lingming is implicit default). Either add an explicit --ling flag or adjust the docstring so the documented flags match the implemented arguments.
| - 宇浩靈明 (--ling) — default | |
| - 宇浩靈明 — default (used when no variant flag is given) |
| - [x] Implement `_display_width(s: String) -> Int` in `utils.mojo`, traversing each code point: | ||
| - CJK Unified Ideographs, CJK Ext-A/B/C/D/E/F/G/H/I/J, fullwidth forms → width 2 | ||
| - Other visible characters → width 1 | ||
| - Zero-width joiners, combining marks → width 0 | ||
| - [ ] Replace `len()` with `_display_width()` in all help formatting padding calculations (`_help_positionals_section`, `_help_options_section`, `_help_commands_section`) | ||
| - [ ] Add tests with mixed CJK/ASCII help text verifying column alignment | ||
| - [x] Replace `len()` with `_display_width()` in all help formatting padding calculations (`_help_positionals_section`, `_help_options_section`, `_help_commands_section`) |
There was a problem hiding this comment.
This section claims _display_width treats “zero-width joiners, combining marks” as width 0, but the current implementation counts them as width 1. Either update _display_width to special-case these codepoints, or adjust this documentation bullet to reflect the actual behavior.
| # Determine display width of this codepoint. | ||
| if _is_wide_codepoint(codepoint): | ||
| width += 2 | ||
| else: | ||
| width += 1 | ||
| return width |
There was a problem hiding this comment.
_display_width currently treats every non-wide codepoint as width 1. This makes combining marks / variation selectors / zero-width joiners count toward column width, which contradicts the documented intent (e.g., planning doc notes they should be width 0) and can still misalign help for composed glyphs (accented characters, emoji sequences). Consider explicitly returning width 0 for known zero-width ranges (combining marks, VS15/VS16, ZWJ, etc.) or updating the documentation to match the implemented behavior.
| var width = 0 | ||
| var i = 0 | ||
| var n = len(s) | ||
| while i < n: | ||
| var b0 = Int(s.as_bytes()[i]) | ||
| # Skip ANSI escape sequences: ESC [ ... final_byte. | ||
| if b0 == 0x1B and i + 1 < n and Int(s.as_bytes()[i + 1]) == 0x5B: | ||
| i += 2 # skip ESC [ | ||
| while i < n: | ||
| var c = Int(s.as_bytes()[i]) | ||
| i += 1 |
There was a problem hiding this comment.
In _display_width, s.as_bytes() is called repeatedly inside the loop (including in the inner ANSI-skip loop). Since width computation may run many times during help generation, it would be more efficient and clearer to store let bytes = s.as_bytes() once and index that buffer throughout.
src/argmojo/command.mojo
Outdated
| if _display_width(pos_plains[k]) > pos_max: | ||
| pos_max = _display_width(pos_plains[k]) |
There was a problem hiding this comment.
The max-width loop computes _display_width(pos_plains[k]) twice per iteration (in the comparison and again in the assignment). Store the width in a local variable to avoid redundant UTF-8 scanning, especially as this scales with number of args.
| if _display_width(pos_plains[k]) > pos_max: | |
| pos_max = _display_width(pos_plains[k]) | |
| var width = _display_width(pos_plains[k]) | |
| if width > pos_max: | |
| pos_max = width |
docs/user_manual.md
Outdated
|
|
||
| ArgMojo's help formatter uses **display width** (East Asian Width) to compute padding, so help descriptions stay aligned even when option names, positional names, subcommand names, or help text contain CJK characters. | ||
|
|
||
| See [Unicode v17.0](https://www.unicode.org/charts/PDF/Unicode-17.0/) for details on CJK character ranges and properties. |
There was a problem hiding this comment.
The manual links to “Unicode v17.0”, but the implementation comments reference Unicode 16.0 for the East Asian Width ranges. To avoid confusion (and potential stale/invalid links), consider aligning the referenced Unicode version across docs and code, or using a versionless link to the Unicode East Asian Width specification.
| See [Unicode v17.0](https://www.unicode.org/charts/PDF/Unicode-17.0/) for details on CJK character ranges and properties. | |
| See the [Unicode East Asian Width specification](https://www.unicode.org/reports/tr11/) for details on CJK character ranges and properties. |
| var local_max: Int = 0 | ||
| var global_max: Int = 0 | ||
| for k in range(len(opt_plains)): | ||
| if opt_persistent[k]: | ||
| if len(opt_plains[k]) > global_max: | ||
| global_max = len(opt_plains[k]) | ||
| if _display_width(opt_plains[k]) > global_max: | ||
| global_max = _display_width(opt_plains[k]) | ||
| else: | ||
| if len(opt_plains[k]) > local_max: | ||
| local_max = len(opt_plains[k]) | ||
| if _display_width(opt_plains[k]) > local_max: | ||
| local_max = _display_width(opt_plains[k]) | ||
| var local_pad = local_max + 4 |
There was a problem hiding this comment.
The local/global max-width loop calls _display_width(opt_plains[k]) multiple times per element. Consider computing it once per iteration (e.g., w = _display_width(...)) and using w for the comparison/assignment to reduce repeated scans.
| var cmd_max: Int = 0 | ||
| for k in range(len(cmd_plains)): | ||
| if len(cmd_plains[k]) > cmd_max: | ||
| cmd_max = len(cmd_plains[k]) | ||
| if _display_width(cmd_plains[k]) > cmd_max: | ||
| cmd_max = _display_width(cmd_plains[k]) | ||
| var cmd_pad = cmd_max + 4 |
There was a problem hiding this comment.
The commands-section padding loop recomputes _display_width(cmd_plains[k]) twice per iteration. Caching the computed width would avoid redundant work and keep the loop consistent with other two-pass formatting logic.
This PR adds CJK-aware terminal display-width calculations to ArgMojo’s help generator so option/positional/subcommand descriptions stay column-aligned when names contain CJK (2-column) characters.
Changes:
_display_width()/_is_wide_codepoint()utilities and use them for help-section padding instead oflen().yu), wiring it into the build.本拉取請求增加了對於漢字及其他寬度爲2的字符的顯示支持,並添加一實例(宇浩輸入法編碼查詢工具)來展示此新功能。