Skip to content

Releases: ashvardanian/StringZilla

v2.0.2

04 Nov 17:19

Choose a tag to compare

2.0.2 (2023-11-04)

Docs

Make

  • Adding .npmignore (ebc8843)
  • Deprecate old benchmarks (832a532)
  • Match directory structure of SimSIMD (6d812ef)

v2.0.1

10 Oct 16:32

Choose a tag to compare

2.0.1 (2023-10-10)

Docs

  • Linking issues and refreshing C part (9a6297c)

Fix

Make

  • NumPy test dependencies (f4c2efe)

Refactor

v2: 5x swifter CPython bindings and first NodeJS bindings

10 Oct 04:36

Choose a tag to compare

Python

So why would anyone replace the easy-to-use PyBind11 with almost 2,000 lines of pure CPython bindings?! Of course, to lower the latency! PyBind11 wraps every C++ object with a smart pointer, puts a hash table next to it, and addresses function pointers with std::string key lookups 🤯

Let's see where it gets us if benchmarking with the "Leipzig1M" dataset. The bandwidth-oriented functions are just as fast as in the past:

  • Hashing the dataset: 77 ms for 🐍 vs 16 ms for 🦖 ~ 4.5x faster
  • Counting the number of "the": 151 ms for 🐍 vs 45 ms for 🦖 ~ 3.3x faster
  • Split all whitespace-delimited words: 782 ms for 🐍 vs 338 ms for 🦖~ 2.3x faster
  • Split around every "the": 240 ms for 🐍 vs 48 ms for 🦖 ~ 5x faster

What about the latency-oriented ones?

  • Find the first whitespace: 1 µs for 🐍 vs 3 µs for 🦖 ~ 3x slower, where previously it was 15µ and 15x slower
  • Partition around the first whitespace: 73 ms for 🐍 vs 33 µs for 🦖 ~ 2212x faster 🥳

JavaScript

In an effort to bring faster string operations, together with @nairihar, we have started the NodeJS binding. It's just a skeleton, and has poor performance for now, but you can use it as a starting point to help us implement faster Str class for JavaScript 🤗

v1.2.2

18 Sep 19:18

Choose a tag to compare

1.2.2 (2023-09-18)

Fix

  • Use different functions depending on arch (4f1414e)

v1.2.1

18 Sep 19:11

Choose a tag to compare

1.2.1 (2023-09-18)

Fix

  • strzl_sort_config_t symbol (78a2e80)

v1.2.0

18 Sep 19:08

Choose a tag to compare

1.2.0 (2023-09-18)

Add

Make

v1.1.3

31 Aug 11:23

Choose a tag to compare

1.1.3 (2023-08-31)

Make

  • Explicitly UTF-8 encoding on Windows (50d78ca)

v1.1.2

31 Aug 11:14

Choose a tag to compare

1.1.2 (2023-08-31)

Docs

Make

v1.1.1

29 Aug 19:02

Choose a tag to compare

1.1.1 (2023-08-29)

Docs

Improve

  • Loading large files into memory (0cf1388)
  • Test reproducibility of the shuffle (50ac20a)

Make

v1.1.0

06 Aug 16:53

Choose a tag to compare

1.1.0 (2023-08-06)

New Functionality

Do you want to work with large arrays of separate strings? There is a way! The following code is now valid:

from stringzilla import Str, File, Strs

text: Str = Str('... very large string or file ...')
lines: Strs = text.split(separator='\n')
lines.sort()
lines.shuffle(seed=42)

sorted_copy: Strs = lines.sorted()
shuffled_copy: Strs = lines.shuffled(seed=42)

lines.append(shuffled_copy.pop(0))
lines.append('Pythonic string')
lines.extend(shuffled_copy)

Performance

You can expect even those trivial operations to be 8x faster than native Python 🤯

Screenshot 2023-08-06 at 21 11 37

Add

  • Collection-level append, extend (9a2b357)
  • random shuffle for strings collections (36c1a58)

Fix

  • static_cast for Clang builds (bd0a671)
  • Counting substrings with allowoverlap (5234e8a)