Skip to content

possible to add a custom lookup dict for characters_to_jyutping  #37

@raymond00000

Description

@raymond00000

Describe the bug
I read this and understand the corpora used for characters_to_jyutping are.
(i) the HKCanCor corpus data included in the PyCantonese library, and (ii) the rime-cantonese data
https://pycantonese.org/jyutping.html

The issue I found is, it seems at least one word, if converted to jyutping, give an incorrect jyutping result?

To reproduce
pycantonese.characters_to_jyutping('到')
[('到', 'dou2')]
pycantonese.characters_to_jyutping('感到')
[('感到', 'gam2dou2')]
pycantonese.characters_to_jyutping('到底')
[('到底', 'dou3dai2')]

Expected behavior
according to here. https://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/
到 should be dou3, so expected results are:
pycantonese.characters_to_jyutping('到')
[('到', 'dou3')]
pycantonese.characters_to_jyutping('感到')
[('感到', 'gam2dou3')]
pycantonese.characters_to_jyutping('到底')
[('到底', 'dou3dai2')]

I wonder if there is any way to resolve this problem, so pycantonese.characters_to_jyutping will return dou3 for 到 and 感到?
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions