-
-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Labels
Description
Describe the bug
While 開會 is correctly parsed as hoi1wui2, derivative phrases like "開過幾多次會" or "開番個會" do not retain the wui2 tonal change
Similar for '捱咗好多晚夜'.
This is not the case for 排隊
To reproduce
>>> pycantonese.characters_to_jyutping('一陣記得要開會')
[('一陣', 'jat1zan2'), ('記得', 'gei3dak1'), ('要', 'jiu3'), ('開會', 'hoi1wui2')]
>>> pycantonese.characters_to_jyutping('一陣開番個會先')
[('一陣', 'jat1zan2'), ('開', 'hoi1'), ('番', 'faan1'), ('個', 'go3'), ('會', 'wui5'), ('先', 'sin1')]
>>> pycantonese.characters_to_jyutping('禁耐以嚟開過幾多次會?')
[('禁', 'gam3'), ('耐', 'noi6'), ('以嚟', 'ji5lai4'), ('開', 'hoi1'), ('過', 'gwo3'), ('幾多', 'gei2do1'), ('次', 'ci3'), ('會', 'wui5'), ('?', None)]
>>> pycantonese.characters_to_jyutping('捱咗好多晚夜')
[('捱', 'ngaai4'), ('咗', 'zo2'), ('好多', 'hou2do1'), ('晚', 'maan5'), ('夜', 'je6')]
>>> pycantonese.characters_to_jyutping('排十次隊都值呀')
[('排', 'paai4'), ('十', 'sap6'), ('次', 'ci3'), ('隊', 'deoi2'), ('都', 'dou1'), ('值', 'zik6'), ('呀', 'aa4')]
Expected behavior
It is impossible to include all of '開咗一次會', '開咗兩次會', '捱咗三晚夜', '捱咗四晚夜' etc so I guess we'll need ('會', 'wui2') and ('夜', 'je2') too?
Screenshots
If applicable, add screenshots to help explain your problem.
System (please complete the following information):
System: Ubuntu 22.04.5 LTS
Output from pip list:
certifi 2025.10.5
charset-normalizer 3.4.3
idna 3.10
pip 22.0.2
pycantonese 3.4.0
pylangacq 0.16.2
python-dateutil 2.9.0.post0
requests 2.32.5
setuptools 59.6.0
six 1.17.0
tabulate 0.9.0
urllib3 2.5.0
wcwidth 0.2.14
wordseg 0.0.2
Additional context
Add any other context about the problem here.