-
Notifications
You must be signed in to change notification settings - Fork 73
本方案碼表製作流程
graphemecluster edited this page Nov 8, 2025
·
4 revisions
Note
本方案上游詞庫已移至 github.com/CanCLID/rime-cantonese-upstream,以下部分內容可能已經過時。
本方案詞庫製作流程詳見本倉庫 build 分支。
Install sgalal/opencc-python
$ git clone https://github.com/sgalal/opencc-python.git
$ cd opencc-python
$ python setup.py installInstall dependencies
$ pip install unihan-etl pandas sortedcontainers$ unihan-etl -f kCantonese -F json --destination build/single_char/data/0-Unihan.json
$ build/build.py- Export Cantonese pronunciation data in kCantonese to
build/single_char/data/0-Unihan.json - Download and process the five data files mentioned above to
/build/single_char/data/0-* - Sanitize the five data files and save to
/build/single_char/data/1-* - Generate the result according to the principles, then save to variable
d_single_char
- Download LSHK Word List to
/build/word/data/香港語言學學會粵拼詞表.txt - Read the file, discard single characters in the file and save the remained data to variable
d_word - Write
d_single_charandd_wordto file