-
-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Labels
Description
Describe the bug
講笑話 can be parsed as 講笑/話 and 講/笑話
>>> pycantonese.characters_to_jyutping('笑話')
[('笑話', 'siu3waa2')]
>>> pycantonese.characters_to_jyutping('佢好鐘意講笑話佢女朋友肥')
[('佢', 'keoi5'), ('好', 'hou2'), ('鐘意', 'zung1ji3'), ('講笑', 'gong2siu3'), ('話', 'waa6'), ('佢', 'keoi5'), ('女朋友', 'neoi2pang4jau5'), ('肥', 'fei4')]
>>> pycantonese.characters_to_jyutping('佢好鐘意講笑話')
[('佢', 'keoi5'), ('好', 'hou2'), ('鐘意', 'zung1ji3'), ('講笑', 'gong2siu3'), ('話', 'waa6')]
Expected behavior
>>> pycantonese.characters_to_jyutping('佢好鐘意講笑話')
[('佢', 'keoi5'), ('好', 'hou2'), ('鐘意', 'zung1ji3'), ('講', 'gong2'), ('笑話', 'siu3waa2')]
There isn't probably that many other similar cases (perhaps even none) as it involves the quoting word 話 interacting with 講笑 in multiple ways.
Screenshots
If applicable, add screenshots to help explain your problem.
System (please complete the following information):
System: Ubuntu 22.04.5 LTS
Output from pip list:
certifi 2025.10.5
charset-normalizer 3.4.3
idna 3.10
pip 22.0.2
pycantonese 3.4.0
pylangacq 0.16.2
python-dateutil 2.9.0.post0
requests 2.32.5
setuptools 59.6.0
six 1.17.0
tabulate 0.9.0
urllib3 2.5.0
wcwidth 0.2.14
wordseg 0.0.2
Additional context
Add any other context about the problem here.