Skip to content

Does Word Segmentation give position of the vocabularies? #42

@shivanraptor

Description

@shivanraptor

Feature you are interested in and your specific question(s):
I'm studying Word Segmentation of PyCantonese (https://pycantonese.org/word_segmentation.html), does the function return also the start & end position of the vocabulary?

What you are trying to accomplish with this feature or functionality:
I would like to achieve:

import pycantonese
from pycantonese.word_segmentation import Segmenter
segmenter = Segmenter()
result = pycantonese.segment("廣東話容唔容易學?", cls=segmenter)
print(result)

Current result:

['廣東話', '容', '唔', '容易', '學', '?']

Would like to have the following result (with the start & end position):

[('廣東話', 0, 3), ('容', 3, 4), ('唔', 4, 5), ('容易', 5, 7), ('學', 7, 8), ('?', 8, 9)]

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions