This package contains several various rolling hashes. The API design philosophy is to stick as closely as possible to the interface provided by the builtin hash package (the hashes implemented here are effectively drop-in replacements for their builtin counterparts), while providing simultaneously the highest speed and simplicity.
A
rollinghash.Hash
is just a hash.Hash which
implements the
rollinghash.Roller
interface. Here is how it is typically used:
data := []byte("here is some data to roll on")
h := buzhash64.New()
n := 16 // window size
// Initialize the rolling window
h.Write(data[:n])
for _, c := range(data[n:]) {
// Slide the window and update the hash
h.Roll(c)
// Get the updated hash value
fmt.Println(h.Sum64())
}A
rollinghash.Hash
maintains a copy of the rolling window in order to keep track of the value
of the byte exiting the window. It can be accessed through the
io.Reader interface of the hash.
var buf bytes.Buffer
// The error is always nil for a bytes.Buffer.
h.WriteWindow(&buf)
window := buf.Bytes()The rolling window MUST be initialized by calling Write first (which
saves a copy). The byte leaving the rolling window is inferred from the
internal copy of the rolling window, which is updated with every call to
Roll.
If you want your code to run at the highest speed, do NOT cast the result
of a New() as a rollinghash.Hash. Instead, use the native type returned
by New(). This is because the go compiler cannot inline calls from an
interface. When later you call Roll(), the native type call will be
inlined by the compiler, but not the casted type call.
var h1 rollinghash.Hash
h1 = buzhash32.New()
h2 := buzhash32.New()
[...]
h1.Roll(b) // Not inlined (slow)
h2.Roll(b) // inlined (fast)On master (unreleased):
-
Simplified the internals of rabinkarp64 (rabinkarp64.Pol.Deg())
-
Extended the test suite to improve coverage
-
Setup continuous performance tracking
-
Setup vulnerability checking
-
Setup dependency checking
In v4.0.0:
-
Writehas become fully consistent withhash.Hash. As opposed to previous versions, where writing data would reinitialize the window, it now appends this data to the existing window. In order to reset the window, one should instead use theResetmethod. -
Calling
Rollon an empty window is considered a bug, and now triggers a panic.
Brief reminder of the behaviors in previous versions:
-
From v0.x.x to v2.x.x:
Rollreturns an error for an empty window.Writereinitializes the rolling window. -
v3.x.x :
Rolldoes not return anything.Writestill reinitializes the rolling window. The rolling window always has a minimum size of 1, which yields wrong results when using roll before having initialized the window.
The RabinKarp64 rollinghash does not yield consistent results before
go1.7. This is because it uses Rand.Read() from the builtin math/rand.
This function was fixed in go
1.7 to produce a consistent
stream of bytes that is independant of the size of the input buffer. If
you depend on this hash, it is strongly recommended to stick to versions
of go superior to 1.7.
This code is delivered to you under the terms of the MIT public license,
except the rabinkarp64 subpackage, which has been adapted from
restic (BSD 2-clause "Simplified").
This library is used by a wide variety of tools, for production and scientific purposes.
- syncthing, a decentralized synchronisation solution
- muscato, a genome analysis tool
- kopia, a backup tool
- pachyderm, a data science platform
If you are using succesfully, let me know and I will happily put a link here!