-
-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Labels
Description
Feature you are interested in and your specific question(s):
I'm studying Word Segmentation of PyCantonese (https://pycantonese.org/word_segmentation.html), does the function return also the start & end position of the vocabulary?
What you are trying to accomplish with this feature or functionality:
I would like to achieve:
import pycantonese
from pycantonese.word_segmentation import Segmenter
segmenter = Segmenter()
result = pycantonese.segment("廣東話容唔容易學?", cls=segmenter)
print(result)
Current result:
['廣東話', '容', '唔', '容易', '學', '?']
Would like to have the following result (with the start & end position):
[('廣東話', 0, 3), ('容', 3, 4), ('唔', 4, 5), ('容易', 5, 7), ('學', 7, 8), ('?', 8, 9)]
Thanks.