File tree Expand file tree Collapse file tree 1 file changed +17
-0
lines changed
Expand file tree Collapse file tree 1 file changed +17
-0
lines changed Original file line number Diff line number Diff line change @@ -92,3 +92,20 @@ When chunking and deduplicating the Linux kernel source tarballs, we
9292observed that for that specific data set the optimal ratio between the
9393minimum and maximum chunk size was somewhere close to 4x. We therefore
9494recommend that this ratio is used as a starting point.
95+
96+ ### Relationship to RDC FilterMax
97+
98+ Microsoft's [ Remote Differential Compression algorithm] ( https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-rdc )
99+ uses a content defined chunking algorithm named FilterMax. Just like
100+ MaxCDC, it attempts to insert cutting points at positions where the hash
101+ value of a rolling hash function is a local maximum. The main difference
102+ is that this is only checked within a small region what the algorithm
103+ names the horizon. This results in a chunk size distribution that is
104+ geometric, similar to traditional Rabin fingerprinting implementations.
105+
106+ Some testing of this construct in combination with the Gear hash
107+ function was performed, using the same methodology as described above.
108+ Deduplicating yielded 398,967 unique chunks with a combined size of
109+ 4,031,959,354 bytes. This is 4.11% worse than FastCDC8KB and 6.38% worse
110+ than MaxCDC. The average chunk size was 10,105 bytes, which is similar
111+ to what was used for the previous tests.
You can’t perform that action at this time.
0 commit comments