Commit 4d59ee4
committed
Fix emoticon matching before letters (e.g., Wikipedia:Diskussionen)
Added trailing context to emoticon rule so :D only matches when NOT
followed by a letter. This prevents false emoticon matches in patterns
like Wikipedia:Diskussionen where the colon is a namespace separator.
Before: Wikipedia:Diskussionen → Wikipedia :D iskussionen
After: Wikipedia:Diskussionen → Wikipedia : Diskussionen
Resolves #134
Change-Id: Ia9d6659e604eb514172e2182c94a206b5b45023f1 parent 658e605 commit 4d59ee4
File tree
2 files changed
+21
-3
lines changed- src
- main/jpc/jflex/de/ids_mannheim/korap/tokenizer
- test/java/de/ids_mannheim/korap/tokenizer
2 files changed
+21
-3
lines changedLines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
910 | 910 | | |
911 | 911 | | |
912 | 912 | | |
913 | | - | |
| 913 | + | |
914 | 914 | | |
915 | 915 | | |
916 | 916 | | |
| |||
Lines changed: 20 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1195 | 1195 | | |
1196 | 1196 | | |
1197 | 1197 | | |
1198 | | - | |
1199 | | - | |
1200 | 1198 | | |
| 1199 | + | |
| 1200 | + | |
| 1201 | + | |
| 1202 | + | |
| 1203 | + | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
| 1212 | + | |
| 1213 | + | |
| 1214 | + | |
| 1215 | + | |
| 1216 | + | |
| 1217 | + | |
| 1218 | + | |
0 commit comments