Skip to content

Commit 03e2b9e

Browse files
committed
Bump version to 2.2.2
Change-Id: I081b0ae75df07a9baa837e1a6cb6046e5e1109d4
1 parent 38ab15a commit 03e2b9e

File tree

3 files changed

+11
-5
lines changed

3 files changed

+11
-5
lines changed

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
# Changelog
22

3+
## 2.2.2
4+
5+
* Bug fix: a single quotation mark at the beginning of a word
6+
is no longer interpreted as a beginning of an omission, but as quotation mark token.
7+
* dependencies updated
8+
39
## 2.2.1
410

511
* "du." is no longer treated as an abbreviation.

Readme.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ By default, KorAP tokenizer reads from standard input and writes to standard out
3535

3636
#### Split English text into tokens
3737
```
38-
$ echo "It's working." | java -jar target/KorAP-Tokenizer-2.2.0.9000-standalone.jar -l en
38+
$ echo "It's working." | java -jar target/KorAP-Tokenizer-2.2.2-standalone.jar -l en
3939
It
4040
's
4141
working
@@ -44,7 +44,7 @@ working
4444
#### Split French text into tokens and sentences
4545
```
4646
$ echo "C'est une phrase. Ici, il s'agit d'une deuxième phrase." \
47-
| java -jar target/KorAP-Tokenizer-2.2.0.9000-standalone.jar -s -l fr
47+
| java -jar target/KorAP-Tokenizer-2.2.2-standalone.jar -s -l fr
4848
C'
4949
est
5050
une
@@ -69,7 +69,7 @@ With the `--positions` option, for example, the tokenizer prints all offsets of
6969
In order to end a text, flush the output and reset the character position, an EOT character (0x04) can be used.
7070
```
7171
$ echo -n -e 'This is a text.\x0a\x04\x0aAnd this is another text.\n\x04\n' |\
72-
java -jar target/KorAP-Tokenizer-2.2.0.9000-standalone.jar --positions
72+
java -jar target/KorAP-Tokenizer-2.2.2-standalone.jar --positions
7373
This
7474
is
7575
a
@@ -87,7 +87,7 @@ text
8787
#### Print token and sentence offset
8888
```
8989
echo -n -e ' This ist a start of a text. And this is a sentence!!! But what the hack????\x0a\x04\x0aAnd this is another text.' |\
90-
java -jar target/KorAP-Tokenizer-2.2.0.9000-standalone.jar --no-tokens --positions --sentence-boundaries
90+
java -jar target/KorAP-Tokenizer-2.2.2-standalone.jar --no-tokens --positions --sentence-boundaries
9191
1 5 6 9 10 11 12 17 18 20 21 22 23 27 27 28 29 32 33 37 38 40 41 42 43 51 51 54 55 58 59 63 64 67 68 72 72 76
9292
1 28 29 54 55 76
9393
0 3 4 8 9 11 12 19 20 24 24 25

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
<groupId>groupId</groupId>
88
<artifactId>KorAP-Tokenizer</artifactId>
9-
<version>2.2.1</version>
9+
<version>2.2.2</version>
1010

1111
<properties>
1212
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

0 commit comments

Comments
 (0)