File tree Expand file tree Collapse file tree 1 file changed +5
-1
lines changed
Expand file tree Collapse file tree 1 file changed +5
-1
lines changed Original file line number Diff line number Diff line change @@ -22,7 +22,11 @@ the german reference corpus.
2222
2323![ Speed comparison of german tokenizers] ( https://raw.githubusercontent.com/KorAP/Datok/master/misc/benchmarks.svg )
2424
25- Speed comparison of different tokenizers and sentence splitters for German.
25+ Chart showing speed comparison of different tokenizers and sentence splitters
26+ for German. ` Effi ` refers to tokenizing and/or sentence splitting of one
27+ issue of [ Effi Briest] ( https://www.gutenberg.org/cache/epub/5323/pg5323.html ) .
28+ Datok is optimized for large batch sizes, while other tools may
29+ perform better in other scenarios.
2630For further benchmarks, especially regarding the quality of tokenization,
2731see Diewald/Kupietz/Lüngen (2022).
2832
You can’t perform that action at this time.
0 commit comments