Skip to content

Inconsistent handling of negative index slicing #7

@alwaysmpe

Description

@alwaysmpe

Howdy, I have my own implementation of graphemes and I'm adding this implementation to my benchmarks.

As part of this I've been checking consistency between this and my implementation, one issue I've found here is that negative index slicing isn't consistently supported, in grapheme_slice it doesn't work:

if startpos < 0:
startpos = 0

but in gslice it does:
if pos < 0:
pos += sgl
if pos < 0:
pos = 0

In terms of benchmarks, this implementation looks to be between 2x and 10x faster than my implementation, so I might switch to this if I can't improve mine. My benchmark is pretty hacky, if you're interested you're welcome to try it or I can share some results. It generates random strings of ~1000 clusters from the example test cases and times each implementation, giving fine grained results for different use cases.

In my implementation I'm translating code points into a character for each character category then using a regex. It's interesting to see the different approaches everyone takes. Mine was the fastest I'd found for most stuff until now, although mine is pure python, no cython. That said, I don't pre-parse any of the tables so the first use has to parse the data files and cache results which is a bit slow but easier to read.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions