Commit 36f8359
committed
Fix KWIC capping
The properties krill.match.max.token and krill.context.max.token and,
correspondingly variables, and parameters like maxTokenMatchSize, were
introduced to configure the maximum visible token length of search hits
with context ("KWICs") and exports, to adhere with copyright and license
restrictions, which are very important.
However, the implementation was flawed and apparently based on a
misunderstanding between linguists, lawyers and programmers.
The only point that matters legally is the total number of tokens shown
in a KWIC snippet (left context + match + right context). If an actual
match is larger than krill.kwic.max.token, it must be cut down to
krill.kwic.max.token, if not the remaining token budget should be
distributed between left and right context, either equally or in such a
way that the total number of capped words in minimized.
Change-Id: Ib0afd476fcd84144d4d9db18839ed8b9952f92e31 parent cd3fb7e commit 36f8359
File tree
10 files changed
+416
-157
lines changed- src
- main
- java/de/ids_mannheim/korap
- response
- util
- resources
- test/java/de/ids_mannheim/korap
- index
- response
- search
10 files changed
+416
-157
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
997 | 997 | | |
998 | 998 | | |
999 | 999 | | |
1000 | | - | |
1001 | | - | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
1002 | 1003 | | |
1003 | 1004 | | |
1004 | 1005 | | |
| |||
1569 | 1570 | | |
1570 | 1571 | | |
1571 | 1572 | | |
1572 | | - | |
1573 | | - | |
1574 | | - | |
1575 | | - | |
1576 | | - | |
| 1573 | + | |
| 1574 | + | |
1577 | 1575 | | |
1578 | 1576 | | |
1579 | 1577 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1672 | 1672 | | |
1673 | 1673 | | |
1674 | 1674 | | |
| 1675 | + | |
| 1676 | + | |
| 1677 | + | |
| 1678 | + | |
| 1679 | + | |
| 1680 | + | |
| 1681 | + | |
| 1682 | + | |
| 1683 | + | |
| 1684 | + | |
| 1685 | + | |
| 1686 | + | |
| 1687 | + | |
| 1688 | + | |
| 1689 | + | |
| 1690 | + | |
| 1691 | + | |
| 1692 | + | |
| 1693 | + | |
| 1694 | + | |
| 1695 | + | |
| 1696 | + | |
| 1697 | + | |
| 1698 | + | |
| 1699 | + | |
| 1700 | + | |
| 1701 | + | |
| 1702 | + | |
| 1703 | + | |
| 1704 | + | |
| 1705 | + | |
| 1706 | + | |
| 1707 | + | |
| 1708 | + | |
| 1709 | + | |
| 1710 | + | |
| 1711 | + | |
| 1712 | + | |
| 1713 | + | |
| 1714 | + | |
| 1715 | + | |
| 1716 | + | |
| 1717 | + | |
| 1718 | + | |
| 1719 | + | |
| 1720 | + | |
| 1721 | + | |
| 1722 | + | |
| 1723 | + | |
| 1724 | + | |
| 1725 | + | |
| 1726 | + | |
| 1727 | + | |
| 1728 | + | |
1675 | 1729 | | |
1676 | 1730 | | |
1677 | 1731 | | |
| |||
2404 | 2458 | | |
2405 | 2459 | | |
2406 | 2460 | | |
| 2461 | + | |
| 2462 | + | |
| 2463 | + | |
| 2464 | + | |
| 2465 | + | |
| 2466 | + | |
| 2467 | + | |
| 2468 | + | |
| 2469 | + | |
| 2470 | + | |
| 2471 | + | |
| 2472 | + | |
| 2473 | + | |
| 2474 | + | |
| 2475 | + | |
| 2476 | + | |
| 2477 | + | |
| 2478 | + | |
| 2479 | + | |
| 2480 | + | |
| 2481 | + | |
| 2482 | + | |
| 2483 | + | |
| 2484 | + | |
| 2485 | + | |
| 2486 | + | |
| 2487 | + | |
| 2488 | + | |
| 2489 | + | |
| 2490 | + | |
| 2491 | + | |
| 2492 | + | |
| 2493 | + | |
| 2494 | + | |
| 2495 | + | |
| 2496 | + | |
| 2497 | + | |
| 2498 | + | |
| 2499 | + | |
| 2500 | + | |
| 2501 | + | |
| 2502 | + | |
| 2503 | + | |
| 2504 | + | |
| 2505 | + | |
| 2506 | + | |
| 2507 | + | |
| 2508 | + | |
| 2509 | + | |
| 2510 | + | |
| 2511 | + | |
| 2512 | + | |
| 2513 | + | |
| 2514 | + | |
| 2515 | + | |
2407 | 2516 | | |
2408 | 2517 | | |
2409 | 2518 | | |
| |||
2415 | 2524 | | |
2416 | 2525 | | |
2417 | 2526 | | |
| 2527 | + | |
| 2528 | + | |
| 2529 | + | |
| 2530 | + | |
| 2531 | + | |
| 2532 | + | |
2418 | 2533 | | |
2419 | 2534 | | |
2420 | 2535 | | |
2421 | 2536 | | |
2422 | 2537 | | |
2423 | 2538 | | |
2424 | | - | |
| 2539 | + | |
2425 | 2540 | | |
2426 | 2541 | | |
2427 | 2542 | | |
| |||
Lines changed: 46 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
26 | 29 | | |
27 | 30 | | |
28 | 31 | | |
| |||
89 | 92 | | |
90 | 93 | | |
91 | 94 | | |
| 95 | + | |
92 | 96 | | |
93 | 97 | | |
94 | 98 | | |
95 | 99 | | |
96 | 100 | | |
97 | 101 | | |
98 | 102 | | |
99 | | - | |
100 | | - | |
| 103 | + | |
101 | 104 | | |
102 | 105 | | |
103 | | - | |
104 | | - | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
105 | 110 | | |
106 | 111 | | |
107 | 112 | | |
| |||
128 | 133 | | |
129 | 134 | | |
130 | 135 | | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
131 | 164 | | |
132 | 165 | | |
133 | 166 | | |
| |||
139 | 172 | | |
140 | 173 | | |
141 | 174 | | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
142 | 184 | | |
143 | 185 | | |
144 | 186 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
0 commit comments