Skip to content

Comparison to SortedContainers.SortedList #1

@grantjenks

Description

@grantjenks

Have you seen the Python SortedContainers module? It's a fast, pure-Python implementation of SortedList, SortedSet, and SortedDict data types. It also has an extensive performance comparison with modules related to yours.

I was interested in your performance claims and I wanted to benchmark SortedContainers against your skip lists. But a SortedList is not quite the same as a SkipList. So I wrote a little wrapper and did a benchmark comparison (using your perf_skiplist.py. Here's my results:

Test PySkipList SortedContainers Compare
skiplist_index_throughput_1000 152326.28 602629.89 3.95617808037 times faster
skiplist_index_throughput_10000 134544.94 485423.76 3.60789309505 times faster
skiplist_index_throughput_100000 100671.55 382475.61 3.79924228841 times faster
skiplist_insert_throughput_1000 77543.06 368406.15 4.75098803168 times faster
skiplist_insert_throughput_10000 57264.03 321809.49 5.61974925621 times faster
skiplist_insert_throughput_100000 42304.85 257105.53 6.07744809401 times faster
skiplist_remove_throughput_1000 88881.20 171370.95 1.92808996728 times faster
skiplist_remove_throughput_10000 66682.10 140854.81 2.11233314488 times faster
skiplist_remove_throughput_100000 50299.01 109368.50 2.17436685136 times faster
skiplist_search_throughput_1000 226229.99 346636.69 1.53223138099 times faster
skiplist_search_throughput_10000 159439.83 241837.23 1.51679307485 times faster
skiplist_search_throughput_100000 103455.39 181585.36 1.75520444126 times faster

I'm not sure if you published this because the SkipList data structure is awesome (which it is) or because you need this in a production environment where performance matters. In the latter case, I wanted to provide this wrapper for you and get your thoughts. Here's the wrapper source (Apache2 License):

from operator import itemgetter
from sortedcontainers import SortedListWithKey

class PySkipList(SortedListWithKey):
    def __init__(self):
        self._list = SortedListWithKey(key=itemgetter(0))

    def insert(self, key, value):
        self._list.add((key, value))

    def replace(self, key, value):
        if not len(self):
            self._list.add((key, value))

        pos = self._list.bisect_key_left(key)
        pair = self._list[pos]

        if key == pair[0]:
            self._list[pos] = (key, value)
        else:
            self._list.add((key, value))

    def clear(self):
        self._list.clear()

    def __len__(self):
        return len(self._list)

    def __iter__(self, start=None, stop=None):
        return self._list.irange_key(
            min_key=start, max_key=stop, inclusive=(False, False)
        )

    items = __iter__

    def keys(self, start=None, stop=None):
        return (pair[0] for pair in self.items(start, stop))

    def values(self, start=None, stop=None):
        return (pair[1] for pair in self.items(start, stop))

    def popitem(self):
        if len(self):
            return self._list.pop(0)
        else:
            raise KeyError

    def search(self, key, default=None):
        if not len(self):
            return default

        pos = self._list.bisect_key_left(key)
        pair = self._list[pos]

        if key == pair[0]:
            return pair
        else:
            return default

    def remove(self, key):
        if not len(self):
            raise KeyError

        pos = self._list.bisect_key_left(key)
        pair = self._list[pos]

        if key == pair[0]:
            return self._list.pop(pos)
        else:
            raise KeyError

    __not_set = object()

    def pop(self, key, default=__not_set):
        if not len(self):
            if default is __not_set:
                raise KeyError
            else:
                return default

        pos = self._list.bisect_key_left(key)
        pair = self._list[pos]

        if key == pair[0]:
            return self._list.pop(pos)[1]
        else:
            if default is __not_set:
                raise KeyError
            else:
                return default

    def __contains__(self, key):
        if not len(self):
            return False

        pos = self._list.bisect_key_left(key)

        return key == self._list[pos][0]

    def index(self, key, default=__not_set):
        if not len(self):
            if default is __not_set:
                raise KeyError
            else:
                return default

        pos = self._list.bisect_key_left(key)
        pair = self._list[pos]

        if key == pair[0]:
            return pos
        else:
            if default is __not_set:
                raise KeyError
            else:
                return default

    def count(self, key):
        start = self._list.bisect_key_left(key)
        end = self._list.bisect_key_right(key)
        return end - start

    def __getitem__(self, pos):
        if isinstance(pos, slice):
            return self._list.islice(pos.start, pos.stop)
        else:
            return self._list[pos]

    def __delitem__(self, pos):
        del self._list[pos]

    def __setitem__(self, pos, value):
        pair = self._list.pop(pos)
        self._list[pos] = (pair[0], value)

If you're interested in making this even faster, let me know. There's a few things that could be done.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions