-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Have you seen the Python SortedContainers module? It's a fast, pure-Python implementation of SortedList, SortedSet, and SortedDict data types. It also has an extensive performance comparison with modules related to yours.
I was interested in your performance claims and I wanted to benchmark SortedContainers against your skip lists. But a SortedList is not quite the same as a SkipList. So I wrote a little wrapper and did a benchmark comparison (using your perf_skiplist.py. Here's my results:
| Test | PySkipList | SortedContainers | Compare |
|---|---|---|---|
| skiplist_index_throughput_1000 | 152326.28 | 602629.89 | 3.95617808037 times faster |
| skiplist_index_throughput_10000 | 134544.94 | 485423.76 | 3.60789309505 times faster |
| skiplist_index_throughput_100000 | 100671.55 | 382475.61 | 3.79924228841 times faster |
| skiplist_insert_throughput_1000 | 77543.06 | 368406.15 | 4.75098803168 times faster |
| skiplist_insert_throughput_10000 | 57264.03 | 321809.49 | 5.61974925621 times faster |
| skiplist_insert_throughput_100000 | 42304.85 | 257105.53 | 6.07744809401 times faster |
| skiplist_remove_throughput_1000 | 88881.20 | 171370.95 | 1.92808996728 times faster |
| skiplist_remove_throughput_10000 | 66682.10 | 140854.81 | 2.11233314488 times faster |
| skiplist_remove_throughput_100000 | 50299.01 | 109368.50 | 2.17436685136 times faster |
| skiplist_search_throughput_1000 | 226229.99 | 346636.69 | 1.53223138099 times faster |
| skiplist_search_throughput_10000 | 159439.83 | 241837.23 | 1.51679307485 times faster |
| skiplist_search_throughput_100000 | 103455.39 | 181585.36 | 1.75520444126 times faster |
I'm not sure if you published this because the SkipList data structure is awesome (which it is) or because you need this in a production environment where performance matters. In the latter case, I wanted to provide this wrapper for you and get your thoughts. Here's the wrapper source (Apache2 License):
from operator import itemgetter
from sortedcontainers import SortedListWithKey
class PySkipList(SortedListWithKey):
def __init__(self):
self._list = SortedListWithKey(key=itemgetter(0))
def insert(self, key, value):
self._list.add((key, value))
def replace(self, key, value):
if not len(self):
self._list.add((key, value))
pos = self._list.bisect_key_left(key)
pair = self._list[pos]
if key == pair[0]:
self._list[pos] = (key, value)
else:
self._list.add((key, value))
def clear(self):
self._list.clear()
def __len__(self):
return len(self._list)
def __iter__(self, start=None, stop=None):
return self._list.irange_key(
min_key=start, max_key=stop, inclusive=(False, False)
)
items = __iter__
def keys(self, start=None, stop=None):
return (pair[0] for pair in self.items(start, stop))
def values(self, start=None, stop=None):
return (pair[1] for pair in self.items(start, stop))
def popitem(self):
if len(self):
return self._list.pop(0)
else:
raise KeyError
def search(self, key, default=None):
if not len(self):
return default
pos = self._list.bisect_key_left(key)
pair = self._list[pos]
if key == pair[0]:
return pair
else:
return default
def remove(self, key):
if not len(self):
raise KeyError
pos = self._list.bisect_key_left(key)
pair = self._list[pos]
if key == pair[0]:
return self._list.pop(pos)
else:
raise KeyError
__not_set = object()
def pop(self, key, default=__not_set):
if not len(self):
if default is __not_set:
raise KeyError
else:
return default
pos = self._list.bisect_key_left(key)
pair = self._list[pos]
if key == pair[0]:
return self._list.pop(pos)[1]
else:
if default is __not_set:
raise KeyError
else:
return default
def __contains__(self, key):
if not len(self):
return False
pos = self._list.bisect_key_left(key)
return key == self._list[pos][0]
def index(self, key, default=__not_set):
if not len(self):
if default is __not_set:
raise KeyError
else:
return default
pos = self._list.bisect_key_left(key)
pair = self._list[pos]
if key == pair[0]:
return pos
else:
if default is __not_set:
raise KeyError
else:
return default
def count(self, key):
start = self._list.bisect_key_left(key)
end = self._list.bisect_key_right(key)
return end - start
def __getitem__(self, pos):
if isinstance(pos, slice):
return self._list.islice(pos.start, pos.stop)
else:
return self._list[pos]
def __delitem__(self, pos):
del self._list[pos]
def __setitem__(self, pos, value):
pair = self._list.pop(pos)
self._list[pos] = (pair[0], value)If you're interested in making this even faster, let me know. There's a few things that could be done.