You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
See google#69
In this patch I'm trying to follow the approach taken by @judofyr
but starting from the top of the script and going through, auditing
every place that perform string operations that split, index, or
otherwise operate on a character level so that we can make sure that
we don't split surrogate pairs.
This contrasts with [attempt one] where I created a custom iterator
for strings. Surprisingly I found this more "ad-hoc" approach easier
to manage since it doesn't create a split universe of string/Unicode.
As of this commit I haven't audited the cleanup functions but my own
tests are passing so I'm given to believe that they might be safe.
I have my own doubts that this is sound work and that the middle-snake
algorithm might find the wrong snake when presented with variable-width
characters.
0 commit comments