Skip to content

How to enable discardPunctuation in Kuromoji Java #134

@yanghanxy

Description

@yanghanxy

Hi,

I can remove the punctuations in Kuromoji-ES plugin by setting "discard_punctuation": "true"

I'm wondering how can I get the same result with Kuromoji-Java?

For example, in Kuromoji-ES, 「浅草」駅 will be tokenized as

{
  "tokens" : [
    {
      "token" : "浅草",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "駅",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "word",
      "position" : 1
    }
  ]
}

Is there a same function with Kuromoji-Java to do so?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions