-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Before I begin, I would like to say that this is a welcome and overdue implementation of the TLSH algorithm that respects Java's conventions. Thank you for putting work into writing an implementation that is more efficient and well-documented.
I have but one suggestion regarding the documentation: I think it would be worth describing in general terms what the benefits and drawbacks are of the different window sizes and digest lengths in the context of TLSH. Does a sliding window value larger than 5 offer greater accuracy when comparing hashes for similarity? Should the choice be influenced by the size of files in a dataset?
These questions sprung to my mind as I reviewed the table. I am not fully familiar with all of the theory behind TLSH, so a paragraph about it would offer valuable insight.