Skip to content

tokenizer not correctly splitting some contractions #26

Open
@francolq

Description

@francolq

The tokenizer is not following standard contraction tokenization [0], expected by the Stanford POS tagger. Contractions are not splitted and should be.

Also, the apostrophe character ´ is not handled.

[0] http://www.cis.upenn.edu/~treebank/tokenization.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions