Skip to content

Commit ecd6946

Browse files
authored
fix: use unicode character class for regex
Fixes #33
1 parent 164d25d commit ecd6946

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

lib/src/main/java/com/knuddels/jtokkit/EncodingFactory.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ private static Encoding fromPredefinedParameters(
124124
final String fileName,
125125
final Map<String, Integer> specialTokens
126126
) {
127-
final Pattern regex = Pattern.compile(patternString);
127+
final Pattern regex = Pattern.compile(patternString, Pattern.UNICODE_CHARACTER_CLASS);
128128
final GptBytePairEncodingParams params = new GptBytePairEncodingParams(name, regex, loadMergeableRanks(fileName), specialTokens);
129129
return fromParameters(params);
130130
}

0 commit comments

Comments
 (0)