Releases: KorAP/KorAP-Tokenizer
Releases · KorAP/KorAP-Tokenizer
KorAP-Tokenizer 2.3.1
Changes in 2.3.1 [2026-01-28]
- Fixed soft hyphens (U+00AD) being incorrectly treated as token boundaries (issue #131)
- Updated dependencies
- Improved compatibility with Java 25 (fixed deprecation warnings)
KorAP-Tokenizer 2.3.0
Changes in 2.3.0 [2024-12-23]
- Fixed genderstern and omission asterisk breaking after hyphens (issue #115)
- Added emoji complex support (issue #113)
- Added Wikipedia emoji template support (issue #114)
- Fixed breaking most frequent hyphenated compound abbreviations for German (issue #116)
- Updated dependencies
Maven Central: https://central.sonatype.com/artifact/de.ids-mannheim.korap.tokenizer/KorAP-Tokenizer/2.3.0
KorAP-Tokenizer-2.2.5
- released on maven central
- more ossrh sync data to maven pom added
- minor code cleanups
- some API documentation added
KorAP-Tokenizer 2.2.3
- Updated dependencies
- Minimum Java version raised to 17
- Fixed group id in pom.xml
- Removed compile dependency on Maven Surefire
- Build artifacts in src/main/jflex are now ignored by git
- java.io's ByteArrayOutputStream used instead of 3rd-party class
KorAP-Tokenizer v2.2.2
2.2.2
- Bug fix: a single quotation mark at the beginning of a word
is no longer interpreted as a beginning of an omission, but as quotation mark token. - dependencies updated
2.2.1 (unreleased)
- "du." is no longer treated as an abbreviation.
KorAP-Tokenizer v2.2.0
Updates
- Apostrophe and hyphen marked contractions and clitics in English (I've, isn't, Peter's, …) and French (j'ai, d'un, l'art, sont-elles, …) are now separated again.
KorAP-Tokenizer v2.1.0
Changes in v2.1.0
- GitHub CI test workflow added
- Dependencies updated
-Xss2madded to maven jvm config
Potentially breaking change
--sentence-boundaries|-snow prints sentence boundaries only if--positions|-pis also present