Skip to content

Comparison with Elias-Fano #2

@judofyr

Description

@judofyr

I've been playing around with various succinct algorithms recently and when I saw this project on Twitter my immediate thought was "oh, let's see how it compares with Elias-Fano". In case you (or another reader of this issue) don't know it already: Elias-Fano is an algorithm for compressing ordered integers. It's quite well-known and typically used in search engine indexes (e.g. Lucene). There's not that many resources about it, but I've written about it here: https://t.holmium.no/dia/elias-fano/.

Note that Elias-Fano has one feature which ncrlite doesn't provide: Constant access to a number at a specific index. With EF you could access the n-th number without having to decompress the previous numbers. This is actually what makes EF so useful. But I still think it's interesting to compare ncrlite against EF.

Soo I wrote a command-line tool which uses my implementation of Elias-Fano and reports the size: judofyr/zini@0d27aae.

Here's the result:

Reading attic/seq/le.csv
Compressing 512652 numbers (4898608 bytes)...
The data would compress to: 806552 bytes

Reading attic/seq/primes.csv
Compressing 1000000 numbers (8245905 bytes)...
The data would compress to: 826448 bytes

Reading attic/seq/sigs.csv
Compressing 9 numbers (44 bytes)...
The data would compress to: 96 bytes

ncrlite is beating it for every file! That's very cool to see! I think I need to look a bit more into the algorithm of ncrlite and understand it further.

(Feel free to just close this issue. I guess this message is closer to a "Discussion", but this seemed to be the most suite place to write it.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions