Skip to content

Commit 2d247e3

Browse files
authored
[core][cjk] Implement CJK-aware help formatting (#15)
This PR adds CJK-aware terminal display-width calculations to ArgMojo’s help generator so option/positional/subcommand descriptions stay column-aligned when names contain CJK (2-column) characters. **Changes:** - Introduce `_display_width()` / `_is_wide_codepoint()` utilities and use them for help-section padding instead of `len()`. - Add help-formatting tests covering CJK and mixed ASCII/CJK alignment. - Update docs/changelog and add a CJK-heavy demo example (`yu`), wiring it into the build. --- 本拉取請求增加了對於漢字及其他寬度爲2的字符的顯示支持,並添加一實例(宇浩輸入法編碼查詢工具)來展示此新功能。
1 parent 378e15f commit 2d247e3

File tree

9 files changed

+656
-70
lines changed

9 files changed

+656
-70
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ logs/
3131
/mgrep
3232
/mgit
3333
/demo
34+
/yu
3435

3536
# Local notes (not tracked)
3637
local/

docs/argmojo_overall_planning.md

Lines changed: 67 additions & 36 deletions
Large diffs are not rendered by default.

docs/changelog.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Comment out unreleased changes here. This file will be edited just before each r
1717
5. Add `.remainder()` builder method on `Argument`. A remainder positional consumes **all** remaining tokens (including ones starting with `-`), similar to argparse `nargs=REMAINDER` or clap `trailing_var_arg`. At most one remainder positional is allowed per command and it must be the last positional (PR #13).
1818
6. Add `parse_known_arguments()` method on `Command`. Like `parse_arguments()`, but unrecognised options are collected into the result instead of raising an error. Access them via `result.get_unknown_args()`. Useful for forwarding unknown flags to another program (PR #13).
1919
7. Add `.allow_hyphen_values()` builder method on `Argument`. When set on a positional, values starting with `-` are accepted without requiring `--` (e.g., `-` for stdin). Remainder positionals have this enabled automatically (PR #13).
20+
8. **CJK-aware help alignment.** Help output now computes column padding using terminal display width instead of byte length. CJK ideographs and fullwidth characters are correctly treated as 2-column-wide, so help descriptions stay aligned when option names, positional names, or subcommand names contain Chinese, Japanese, or Korean characters. ANSI escape sequences are skipped during width calculation. No API changes — this is automatic (PR #14).
2021

2122
### 🔧 Fixes and API changes
2223

docs/user_manual.md

Lines changed: 52 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ from argmojo import Argument, Command
6464
- [Auto-generated Help](#auto-generated-help)
6565
- [Custom Tips](#custom-tips)
6666
- [Version Display](#version-display)
67+
- [CJK-Aware Help Alignment](#cjk-aware-help-alignment)
6768
- [Parsing Behaviour](#parsing-behaviour)
6869
- [Negative Number Passthrough](#negative-number-passthrough)
6970
- [Long Option Prefix Matching](#long-option-prefix-matching)
@@ -2207,9 +2208,7 @@ Options:
22072208
-V, --version Show version
22082209
```
22092210

2210-
Help text columns are **dynamically aligned**: the padding between the option
2211-
names and the description text adjusts automatically based on the longest
2212-
option line, so everything stays neatly aligned regardless of option length.
2211+
Help text columns are **dynamically aligned**: the padding between the option names and the description text adjusts automatically based on the longest option line, so everything stays neatly aligned regardless of option length.
22132212

22142213
---
22152214

@@ -2236,9 +2235,7 @@ var help_plain = command._generate_help(color=False) # no ANSI codes
22362235

22372236
**Custom Colours**
22382237

2239-
The **header colour**, **argument-name colour**, **deprecation warning
2240-
colour**, and **parse error colour** are all customisable. Section headers
2241-
always keep the **bold + underline** style; only the colour changes.
2238+
The **header colour**, **argument-name colour**, **deprecation warning colour**, and **parse error colour** are all customisable. Section headers always keep the **bold + underline** style; only the colour changes.
22422239

22432240
```mojo
22442241
var command = Command("myapp", "My app")
@@ -2264,9 +2261,7 @@ Available colour names (case-insensitive):
22642261

22652262
An unrecognised colour name raises an `Error` at runtime.
22662263

2267-
Padding calculation is always based on the **plain-text width** (without
2268-
escape codes), so columns remain correctly aligned regardless of whether
2269-
colour is enabled.
2264+
Padding calculation is always based on the **plain-text width** (without escape codes), so columns remain correctly aligned regardless of whether colour is enabled.
22702265

22712266
**What controls the output:**
22722267

@@ -2302,8 +2297,7 @@ This takes priority over the `color=True` default but does **not** override an e
23022297

23032298
**Show Help When No Arguments Provided**
23042299

2305-
Use `help_on_no_arguments()` to automatically display help when the user invokes
2306-
the command with no arguments (like `git`, `docker`, or `cargo`):
2300+
Use `help_on_no_arguments()` to automatically display help when the user invokes the command with no arguments (like `git`, `docker`, or `cargo`):
23072301

23082302
```mojo
23092303
var command = Command("myapp", "My application")
@@ -2317,14 +2311,11 @@ myapp # prints help and exits
23172311
myapp --file x # normal parsing
23182312
```
23192313

2320-
This is particularly useful for commands that require arguments — instead of
2321-
showing an obscure "missing required argument" error, the user sees the
2322-
full help text.
2314+
This is particularly useful for commands that require arguments — instead of showing an obscure "missing required argument" error, the user sees the full help text.
23232315

23242316
### Custom Tips
23252317

2326-
Add custom **tip lines** to the bottom of your help output with `add_tip()`.
2327-
This is useful for documenting common patterns, gotchas, or examples.
2318+
Add custom **tip lines** to the bottom of your help output with `add_tip()`. This is useful for documenting common patterns, gotchas, or examples.
23282319

23292320
```mojo
23302321
var command = Command("calc", "A calculator")
@@ -2352,10 +2343,7 @@ Tip: Use quotes if you use spaces in expressions.
23522343

23532344
---
23542345

2355-
**Smart default tip** — when positional arguments are defined, ArgMojo automatically adds a
2356-
built-in tip explaining the `--` separator. The example in this default tip adapts
2357-
based on whether negative numbers are auto-detected: if they are, it uses
2358-
`-my-value`; otherwise, it uses `-10.18`.
2346+
**Smart default tip** — when positional arguments are defined, ArgMojo automatically adds a built-in tip explaining the `--` separator. The example in this default tip adapts based on whether negative numbers are auto-detected: if they are, it uses `-my-value`; otherwise, it uses `-10.18`.
23592347

23602348
User-defined tips appear **below** the built-in tip.
23612349

@@ -2386,6 +2374,50 @@ var command = Command("myapp", "Description", version="1.0.0")
23862374

23872375
After printing the version, the program exits cleanly with exit code 0.
23882376

2377+
### CJK-Aware Help Alignment
2378+
2379+
ArgMojo automatically handles CJK (Chinese, Japanese, Korean) characters in help output. CJK ideographs and fullwidth characters occupy **two terminal columns** instead of one, so naïve byte- or codepoint-based padding would cause misaligned help columns.
2380+
2381+
ArgMojo's help formatter uses **display width** (East Asian Width) to compute padding, so help descriptions stay aligned even when option names, positional names, subcommand names, or help text contain CJK characters.
2382+
2383+
See the [Unicode East Asian Width specification](https://www.unicode.org/reports/tr11/) for details on CJK character ranges and properties.
2384+
2385+
**Example — mixed ASCII and CJK options:**
2386+
2387+
```mojo
2388+
var command = Command("工具", "一個命令行工具")
2389+
command.add_argument(
2390+
Argument("output", help="Output path").long("output").short("o")
2391+
)
2392+
command.add_argument(
2393+
Argument("編碼", help="設定編碼").long("編碼")
2394+
)
2395+
```
2396+
2397+
```txt
2398+
Options:
2399+
-o, --output <output> Output path
2400+
--編碼 <編碼> 設定編碼
2401+
```
2402+
2403+
**Example — CJK subcommands:**
2404+
2405+
```mojo
2406+
var app = Command("工具", "一個命令行工具")
2407+
var init_cmd = Command("初始化", "建立新項目")
2408+
app.add_subcommand(init_cmd^)
2409+
var build_cmd = Command("構建", "編譯項目")
2410+
app.add_subcommand(build_cmd^)
2411+
```
2412+
2413+
```txt
2414+
Commands:
2415+
初始化 建立新項目
2416+
構建 編譯項目
2417+
```
2418+
2419+
No configuration is needed — CJK-aware alignment is always active.
2420+
23892421
## Parsing Behaviour
23902422

23912423
### Negative Number Passthrough

examples/yu.mojo

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
"""Example: Yuhao Input Method character code lookup.
2+
3+
例:宇浩輸入法單字編碼查詢
4+
5+
A CJK-heavy demo that showcases ArgMojo's CJK-aware help alignment.
6+
The purpose of the app is to lookup the encoding of Chinese characters in the
7+
Yuhao Input Method (宇浩輸入法).
8+
9+
In Yuhao Input Method, each Chinese character is represented by a 4-letter code
10+
based on its components and radicals. For example, the character "字" is encoded
11+
as "khvi" in the Lingming variant.
12+
13+
Yuhao Input Method has several variants: The app supports looking up any variant
14+
individually or all three side by side.
15+
16+
For full character tables, see https://shurufa.app
17+
18+
This demo app supports three Yuhao IME variants:
19+
- 宇浩靈明 — default (used when no variant flag is given)
20+
- 宇浩卿雲 (--joy)
21+
- 宇浩星陳 (--star)
22+
23+
Try these (build first with: `pixi run build`):
24+
25+
./yu --help
26+
./yu 字
27+
./yu 宇浩靈明
28+
./yu --joy 字根
29+
./yu --star 你好
30+
./yu --all 宇浩
31+
./yu --version
32+
"""
33+
34+
from argmojo import Argument, Command
35+
36+
37+
fn _build_ling_table() -> Dict[String, String]:
38+
"""Build 宇浩靈明 lookup table (20 high-frequency characters)."""
39+
var d: Dict[String, String] = {
40+
"": "d",
41+
"": "fi",
42+
"": "i",
43+
"": "u",
44+
"": "a",
45+
"": "ne",
46+
"": "o",
47+
"": "mvu",
48+
"": "me",
49+
"": "jse",
50+
"": "rwo",
51+
"": "ju",
52+
"": "ka",
53+
"": "rla",
54+
"": "kva",
55+
"": "yda",
56+
"": "di",
57+
"": "khvi",
58+
"": "kfjo",
59+
"": "vmdo",
60+
"": "ja",
61+
"": "fhi",
62+
}
63+
return d^
64+
65+
66+
fn _build_joy_table() -> Dict[String, String]:
67+
"""Build 宇浩卿雲 lookup table (20 high-frequency characters)."""
68+
var d: Dict[String, String] = {
69+
"": "d",
70+
"": "f",
71+
"": "j",
72+
"": "n",
73+
"": "l",
74+
"": "ur",
75+
"": "w",
76+
"": "xl",
77+
"": "x",
78+
"": "e",
79+
"": "ruc",
80+
"": "ebog",
81+
"": "o",
82+
"": "cl",
83+
"": "uo",
84+
"": "md",
85+
"": "k",
86+
"": "il",
87+
"": "ife",
88+
"": "npk",
89+
"": "eo",
90+
"": "wlz",
91+
}
92+
return d^
93+
94+
95+
fn _build_star_table() -> Dict[String, String]:
96+
"""Build 宇浩星陳 lookup table (20 high-frequency characters)."""
97+
var d: Dict[String, String] = {
98+
"": "d",
99+
"": "f",
100+
"": "j",
101+
"": "v",
102+
"": "k",
103+
"": "r",
104+
"": "g",
105+
"": "eu",
106+
"": "ew",
107+
"": "eo",
108+
"": "bocy",
109+
"": "ewj",
110+
"": "jv",
111+
"": "all",
112+
"": "dm",
113+
"": "o",
114+
"": "l",
115+
"": "ikz",
116+
"": "ifk",
117+
"": "npl",
118+
"": "e",
119+
"": "c",
120+
}
121+
return d^
122+
123+
124+
fn _lookup(table: Dict[String, String], ch: String) raises -> String:
125+
if ch in table:
126+
return table[ch]
127+
return "(未收錄)"
128+
129+
130+
fn main() raises:
131+
var app = Command(
132+
"yu",
133+
"宇浩輸入法單字編碼查詢。完整碼表請見 https://shurufa.app",
134+
version="0.1.0",
135+
)
136+
137+
app.add_argument(
138+
Argument("漢字", help="要查詢的漢字(可以輸入多個漢字)").positional().required()
139+
)
140+
app.add_argument(
141+
Argument("joy", help="使用卿雲編碼(預設為靈明)").long("joy").short("j").flag()
142+
)
143+
app.add_argument(
144+
Argument("star", help="使用星陳編碼(預設為靈明)").long("star").short("s").flag()
145+
)
146+
app.add_argument(
147+
Argument("all", help="同時顯示靈明、卿雲、星陳編碼").long("all").short("a").flag()
148+
)
149+
150+
app.add_tip("完整碼表與教程請訪問 https://shurufa.app")
151+
152+
var args = app.parse()
153+
var input = args.get_string("漢字")
154+
var use_joy = args.get_flag("joy")
155+
var use_star = args.get_flag("star")
156+
var show_all = args.get_flag("all")
157+
158+
var ling = _build_ling_table()
159+
var joy = _build_joy_table()
160+
var star = _build_star_table()
161+
162+
# Extract individual codepoints from the UTF-8 input string.
163+
var chars = List[String]()
164+
var bytes = input.as_bytes()
165+
var i = 0
166+
var n = len(bytes)
167+
while i < n:
168+
var b0 = Int(bytes[i])
169+
var seq_len: Int
170+
if b0 < 0x80:
171+
seq_len = 1
172+
elif b0 < 0xE0:
173+
seq_len = 2
174+
elif b0 < 0xF0:
175+
seq_len = 3
176+
else:
177+
seq_len = 4
178+
chars.append(String(input[i : i + seq_len]))
179+
i += seq_len
180+
181+
if show_all:
182+
print("漢字\t靈明\t卿雲\t星陳")
183+
print("────\t────\t────\t────")
184+
for k in range(len(chars)):
185+
var ch = chars[k]
186+
print(
187+
ch
188+
+ "\t"
189+
+ _lookup(ling, ch)
190+
+ "\t"
191+
+ _lookup(joy, ch)
192+
+ "\t"
193+
+ _lookup(star, ch)
194+
)
195+
elif use_star:
196+
print("漢字\t星陳編碼")
197+
print("────\t────────")
198+
for k in range(len(chars)):
199+
var ch = chars[k]
200+
print(ch + "\t" + _lookup(star, ch))
201+
elif use_joy:
202+
print("漢字\t卿雲編碼")
203+
print("────\t────────")
204+
for k in range(len(chars)):
205+
var ch = chars[k]
206+
print(ch + "\t" + _lookup(joy, ch))
207+
else:
208+
print("漢字\t靈明編碼")
209+
print("────\t────────")
210+
for k in range(len(chars)):
211+
var ch = chars[k]
212+
print(ch + "\t" + _lookup(ling, ch))

pixi.toml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,9 @@ test = """\
4848
build = """pixi run package \
4949
&& mojo build -I src examples/mgrep.mojo -o mgrep \
5050
&& mojo build -I src examples/mgit.mojo -o mgit \
51-
&& mojo build -I src examples/demo.mojo -o demo"""
51+
&& mojo build -I src examples/demo.mojo -o demo \
52+
&& mojo build -I src examples/yu.mojo -o yu \
53+
"""
5254

5355
# clean build artifacts
54-
clean = "rm -f argmojo.mojopkg mgrep mgit demo"
56+
clean = "rm -f argmojo.mojopkg mgrep mgit demo yu"

0 commit comments

Comments
 (0)