Skip to content

[TODO] Word tokenizition #1

@wannaphong

Description

@wannaphong

Today, Thai have many word tokenizition model and It's hard to train own model, so I think we can add easy train model method.

Preprocessing

Get pandas.DataFrame to train/val/test model

| Domain | type(train/test/val) | text |
| news | train | ผม|กำลัง|อยู่|ที่|ทำเนียนัฐบาล |

and train/va/test by each domain.

Model

Train with

  • Deepcut

and more...

Inference

Get onnx to running model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions