Skip to content

Unhandled edge case: grammar-related tone changes #54

@henrymcl

Description

@henrymcl

Describe the bug
While 開會 is correctly parsed as hoi1wui2, derivative phrases like "開過幾多次會" or "開番個會" do not retain the wui2 tonal change

Similar for '捱咗好多晚夜'.

This is not the case for 排隊

To reproduce

>>> pycantonese.characters_to_jyutping('一陣記得要開會')
[('一陣', 'jat1zan2'), ('記得', 'gei3dak1'), ('要', 'jiu3'), ('開會', 'hoi1wui2')]
>>> pycantonese.characters_to_jyutping('一陣開番個會先')
[('一陣', 'jat1zan2'), ('開', 'hoi1'), ('番', 'faan1'), ('個', 'go3'), ('會', 'wui5'), ('先', 'sin1')]
>>> pycantonese.characters_to_jyutping('禁耐以嚟開過幾多次會?')
[('禁', 'gam3'), ('耐', 'noi6'), ('以嚟', 'ji5lai4'), ('開', 'hoi1'), ('過', 'gwo3'), ('幾多', 'gei2do1'), ('次', 'ci3'), ('會', 'wui5'), ('?', None)]
>>> pycantonese.characters_to_jyutping('捱咗好多晚夜')
[('捱', 'ngaai4'), ('咗', 'zo2'), ('好多', 'hou2do1'), ('晚', 'maan5'), ('夜', 'je6')]
>>> pycantonese.characters_to_jyutping('排十次隊都值呀')
[('排', 'paai4'), ('十', 'sap6'), ('次', 'ci3'), ('隊', 'deoi2'), ('都', 'dou1'), ('值', 'zik6'), ('呀', 'aa4')]

Expected behavior
It is impossible to include all of '開咗一次會', '開咗兩次會', '捱咗三晚夜', '捱咗四晚夜' etc so I guess we'll need ('會', 'wui2') and ('夜', 'je2') too?

Screenshots
If applicable, add screenshots to help explain your problem.

System (please complete the following information):

System: Ubuntu 22.04.5 LTS

Output from pip list:

certifi            2025.10.5
charset-normalizer 3.4.3
idna               3.10
pip                22.0.2
pycantonese        3.4.0
pylangacq          0.16.2
python-dateutil    2.9.0.post0
requests           2.32.5
setuptools         59.6.0
six                1.17.0
tabulate           0.9.0
urllib3            2.5.0
wcwidth            0.2.14
wordseg            0.0.2

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions