Skip to content

Releases: KorAP/KorAP-Tokenizer

KorAP-Tokenizer 2.3.1

28 Jan 15:49

Choose a tag to compare

Changes in 2.3.1 [2026-01-28]

  • Fixed soft hyphens (U+00AD) being incorrectly treated as token boundaries (issue #131)
  • Updated dependencies
  • Improved compatibility with Java 25 (fixed deprecation warnings)

KorAP-Tokenizer 2.3.0

24 Dec 10:11

Choose a tag to compare

Changes in 2.3.0 [2024-12-23]

  • Fixed genderstern and omission asterisk breaking after hyphens (issue #115)
  • Added emoji complex support (issue #113)
  • Added Wikipedia emoji template support (issue #114)
  • Fixed breaking most frequent hyphenated compound abbreviations for German (issue #116)
  • Updated dependencies

Maven Central: https://central.sonatype.com/artifact/de.ids-mannheim.korap.tokenizer/KorAP-Tokenizer/2.3.0

KorAP-Tokenizer-2.2.5

08 Sep 14:30

Choose a tag to compare

KorAP-Tokenizer 2.2.3

07 Sep 16:42

Choose a tag to compare

  • Updated dependencies
  • Minimum Java version raised to 17
  • Fixed group id in pom.xml
  • Removed compile dependency on Maven Surefire
  • Build artifacts in src/main/jflex are now ignored by git
  • java.io's ByteArrayOutputStream used instead of 3rd-party class

KorAP-Tokenizer v2.2.2

17 Jan 08:38

Choose a tag to compare

2.2.2

  • Bug fix: a single quotation mark at the beginning of a word
    is no longer interpreted as a beginning of an omission, but as quotation mark token.
  • dependencies updated

2.2.1 (unreleased)

  • "du." is no longer treated as an abbreviation.

KorAP-Tokenizer v2.2.0

29 Jul 07:43

Choose a tag to compare

Updates

  • Apostrophe and hyphen marked contractions and clitics in English (I've, isn't, Peter's, …) and French (j'ai, d'un, l'art, sont-elles, …) are now separated again.

KorAP-Tokenizer v2.1.0

29 Jun 09:53

Choose a tag to compare

Changes in v2.1.0

  • GitHub CI test workflow added
  • Dependencies updated
  • -Xss2m added to maven jvm config

Potentially breaking change

  • --sentence-boundaries|-s now prints sentence boundaries only if --positions|-p is also present