Factors to consider in deciding which algorithm variant to use

Before I begin, I would like to say that this is a welcome and overdue implementation of the TLSH algorithm that respects Java's conventions. Thank you for putting work into writing an implementation that is more efficient and well-documented.

I have but one suggestion regarding the documentation: I think it would be worth describing in general terms what the benefits and drawbacks are of the different window sizes and digest lengths in the context of TLSH. Does a sliding window value larger than 5 offer greater accuracy when comparing hashes for similarity? Should the choice be influenced by the size of files in a dataset?

These questions sprung to my mind as I reviewed the table. I am not fully familiar with all of the theory behind TLSH, so a paragraph about it would offer valuable insight.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Factors to consider in deciding which algorithm variant to use #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Factors to consider in deciding which algorithm variant to use #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions