-
Notifications
You must be signed in to change notification settings - Fork 0
Description
I've been playing around with various succinct algorithms recently and when I saw this project on Twitter my immediate thought was "oh, let's see how it compares with Elias-Fano". In case you (or another reader of this issue) don't know it already: Elias-Fano is an algorithm for compressing ordered integers. It's quite well-known and typically used in search engine indexes (e.g. Lucene). There's not that many resources about it, but I've written about it here: https://t.holmium.no/dia/elias-fano/.
Note that Elias-Fano has one feature which ncrlite doesn't provide: Constant access to a number at a specific index. With EF you could access the n-th number without having to decompress the previous numbers. This is actually what makes EF so useful. But I still think it's interesting to compare ncrlite against EF.
Soo I wrote a command-line tool which uses my implementation of Elias-Fano and reports the size: judofyr/zini@0d27aae.
Here's the result:
Reading attic/seq/le.csv
Compressing 512652 numbers (4898608 bytes)...
The data would compress to: 806552 bytes
Reading attic/seq/primes.csv
Compressing 1000000 numbers (8245905 bytes)...
The data would compress to: 826448 bytes
Reading attic/seq/sigs.csv
Compressing 9 numbers (44 bytes)...
The data would compress to: 96 bytes
ncrlite is beating it for every file! That's very cool to see! I think I need to look a bit more into the algorithm of ncrlite and understand it further.
(Feel free to just close this issue. I guess this message is closer to a "Discussion", but this seemed to be the most suite place to write it.)