-
Couldn't load subscription status.
- Fork 1
Open
Description
On the front page, the code claiming ICU is buggy is doing it wrong. ICU returns indices as UTF-16 or UTF-8 indices, not "character" indices like Python expects. Here is a fix:
diff --git a/test.py b/test.py
index d040782..1318933 100644
--- a/test.py
+++ b/test.py
@@ -1,12 +1,13 @@
import icu
def iterate_breaks(text, break_iterator):
+ text = icu.UnicodeString(text)
break_iterator.setText(text)
lastpos = 0
while True:
next_boundary = break_iterator.nextBoundary()
print(next_boundary)
if next_boundary == -1: return
- yield text[lastpos:next_boundary]
+ yield str(text[lastpos:next_boundary])
lastpos = next_boundary
bi = icu.BreakIterator.createCharacterInstance(icu.Locale.getRoot())khaledhosny and Z4JC
Metadata
Metadata
Assignees
Labels
No labels