Skip to content

Commit fbb535e

Browse files
author
Philip Müller
committed
test: add tests for unicode inputs
1 parent ecd6946 commit fbb535e

4 files changed

Lines changed: 9 additions & 0 deletions

File tree

lib/src/test/resources/cl100k_base_encodings.csv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -420,3 +420,5 @@ Quel est votre caractère chinois préféré ? Et comment le dessiner ?,"[2232,
420420
"Olá, como vai você?","[43819, 1995, 11, 8112, 40586, 25738, 30]","[43819, 1995, 11, 8112, 40586, 25738, 30]"
421421
"Здравствуй, как поживаете?","[36551, 7094, 28086, 20812, 83680, 11, 52770, 5173, 21956, 28089, 28007, 1532, 30]","[36551, 7094, 28086, 20812, 83680, 11, 52770, 5173, 21956, 28089]"
422422
"Hola, ¿cómo estás?","[69112, 11, 29386, 66, 72561, 1826, 7206, 30]","[69112, 11, 29386, 66, 72561, 1826, 7206, 30]"
423+
"  ", "[44529]", "[44529]"
424+
"  a", "[23249, 23249, 64]", "[23249, 23249, 64]"

lib/src/test/resources/p50k_base_encodings.csv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -420,3 +420,5 @@ Quel est votre caractère chinois préféré ? Et comment le dessiner ?,"[48, 27
420420
"Olá, como vai você?","[30098, 6557, 11, 401, 78, 410, 1872, 12776, 25792, 30]","[30098, 6557, 11, 401, 78, 410, 1872, 12776, 25792, 30]"
421421
"Здравствуй, как поживаете?","[140, 245, 43666, 21169, 16142, 38857, 21727, 20375, 38857, 35072, 140, 117, 11, 12466, 118, 16142, 31583, 12466, 123, 25443, 114, 18849, 38857, 16142, 16843, 20375, 16843, 30]","[140, 245, 43666, 21169, 16142, 38857, 21727, 20375, 38857, 35072]"
422422
"Hola, ¿cómo estás?","[39, 5708, 11, 1587, 123, 66, 10205, 5908, 1556, 40138, 30]","[39, 5708, 11, 1587, 123, 66, 10205, 5908, 1556, 40138]"
423+
"  ", "[5099, 222, 5099, 222]", "[5099, 222, 5099, 222]"
424+
"  a", "[5099, 222, 5099, 222, 64]", "[5099, 222, 5099, 222, 64]"

lib/src/test/resources/p50k_edit_encodings.csv

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -420,3 +420,6 @@ Quel est votre caractère chinois préféré ? Et comment le dessiner ?,"[48, 27
420420
"Olá, como vai você?","[30098, 6557, 11, 401, 78, 410, 1872, 12776, 25792, 30]","[30098, 6557, 11, 401, 78, 410, 1872, 12776, 25792, 30]"
421421
"Здравствуй, как поживаете?","[140, 245, 43666, 21169, 16142, 38857, 21727, 20375, 38857, 35072, 140, 117, 11, 12466, 118, 16142, 31583, 12466, 123, 25443, 114, 18849, 38857, 16142, 16843, 20375, 16843, 30]","[140, 245, 43666, 21169, 16142, 38857, 21727, 20375, 38857, 35072]"
422422
"Hola, ¿cómo estás?","[39, 5708, 11, 1587, 123, 66, 10205, 5908, 1556, 40138, 30]","[39, 5708, 11, 1587, 123, 66, 10205, 5908, 1556, 40138]"
423+
"  ", "[5099, 222, 5099, 222]", "[5099, 222, 5099, 222]"
424+
"  a", "[5099, 222, 5099, 222, 64]", "[5099, 222, 5099, 222, 64]"
425+

lib/src/test/resources/r50k_base_encodings.csv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -420,3 +420,5 @@ Quel est votre caractère chinois préféré ? Et comment le dessiner ?,"[48, 27
420420
"Olá, como vai você?","[30098, 6557, 11, 401, 78, 410, 1872, 12776, 25792, 30]","[30098, 6557, 11, 401, 78, 410, 1872, 12776, 25792, 30]"
421421
"Здравствуй, как поживаете?","[140, 245, 43666, 21169, 16142, 38857, 21727, 20375, 38857, 35072, 140, 117, 11, 12466, 118, 16142, 31583, 12466, 123, 25443, 114, 18849, 38857, 16142, 16843, 20375, 16843, 30]","[140, 245, 43666, 21169, 16142, 38857, 21727, 20375, 38857, 35072]"
422422
"Hola, ¿cómo estás?","[39, 5708, 11, 1587, 123, 66, 10205, 5908, 1556, 40138, 30]","[39, 5708, 11, 1587, 123, 66, 10205, 5908, 1556, 40138]"
423+
"  ", "[5099, 222, 5099, 222]", "[5099, 222, 5099, 222]"
424+
"  a", "[5099, 222, 5099, 222, 64]", "[5099, 222, 5099, 222, 64]"

0 commit comments

Comments
 (0)