28 Jan 15:49

kupietz

KorAP-Tokenizer-2.3.1

KorAP-Tokenizer 2.3.1 Latest

Latest

Changes in 2.3.1 [2026-01-28]

Fixed soft hyphens (U+00AD) being incorrectly treated as token boundaries (issue #131)
Updated dependencies
Improved compatibility with Java 25 (fixed deprecation warnings)

Assets 2

24 Dec 10:11

kupietz

KorAP-Tokenizer-2.3.0

KorAP-Tokenizer 2.3.0

Changes in 2.3.0 [2024-12-23]

Fixed genderstern and omission asterisk breaking after hyphens (issue #115)
Added emoji complex support (issue #113)
Added Wikipedia emoji template support (issue #114)
Fixed breaking most frequent hyphenated compound abbreviations for German (issue #116)
Updated dependencies

Maven Central: https://central.sonatype.com/artifact/de.ids-mannheim.korap.tokenizer/KorAP-Tokenizer/2.3.0

Assets 2

08 Sep 14:30

kupietz

KorAP-Tokenizer-2.2.5

KorAP-Tokenizer-2.2.5

released on maven central
more ossrh sync data to maven pom added
minor code cleanups
some API documentation added

Assets 6

07 Sep 16:42

kupietz

KorAP-Tokenizer-2.2.3

KorAP-Tokenizer 2.2.3

Updated dependencies
Minimum Java version raised to 17
Fixed group id in pom.xml
Removed compile dependency on Maven Surefire
Build artifacts in src/main/jflex are now ignored by git
java.io's ByteArrayOutputStream used instead of 3rd-party class

Assets 5

17 Jan 08:38

kupietz

KorAP-Tokenizer v2.2.2

2.2.2

Bug fix: a single quotation mark at the beginning of a word
is no longer interpreted as a beginning of an omission, but as quotation mark token.
dependencies updated

2.2.1 (unreleased)

"du." is no longer treated as an abbreviation.

Assets 3

29 Jul 07:43

kupietz

KorAP-Tokenizer v2.2.0

Updates

Apostrophe and hyphen marked contractions and clitics in English (I've, isn't, Peter's, …) and French (j'ai, d'un, l'art, sont-elles, …) are now separated again.

Assets 3

29 Jun 09:53

kupietz

KorAP-Tokenizer v2.1.0

Changes in v2.1.0

GitHub CI test workflow added
Dependencies updated
-Xss2m added to maven jvm config

Potentially breaking change

--sentence-boundaries|-s now prints sentence boundaries only if --positions|-p is also present

Assets 3