Skip to content

Commit 7b61a0b

Browse files
committed
Improve performance description
Change-Id: I8301b58503794cc709fe16c23c7c55eeb2b1b872
1 parent 7efa7e5 commit 7b61a0b

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

Readme.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,11 @@ the german reference corpus.
2222

2323
![Speed comparison of german tokenizers](https://raw.githubusercontent.com/KorAP/Datok/master/misc/benchmarks.svg)
2424

25-
Speed comparison of different tokenizers and sentence splitters for German.
25+
Chart showing speed comparison of different tokenizers and sentence splitters
26+
for German. `Effi` refers to tokenizing and/or sentence splitting of one
27+
issue of [Effi Briest](https://www.gutenberg.org/cache/epub/5323/pg5323.html).
28+
Datok is optimized for large batch sizes, while other tools may
29+
perform better in other scenarios.
2630
For further benchmarks, especially regarding the quality of tokenization,
2731
see Diewald/Kupietz/Lüngen (2022).
2832

0 commit comments

Comments
 (0)