Skip to content

Speedup scan speed for '--patch-from' via rolling hashes #2189

Open
@RubenKelevra

Description

@RubenKelevra

IPFS has the ability to dedup blocks between different types of files. This functionality is based on a rolling hash algorithm.

You can either select rabin or buzzhash for this task (in IPFS). Rabin is kind of slow, but buzzhash is quite fast.

The rolling hash would allow to 'prescan' both files, get some cut marks and run some fast cryptographic hash algorithm over the chunks, like blake2b.

I think both operations are much cheaper than pattern matching. This way you can skip all pattern matching attempts which are on both sides (A and B) inside the known equal blocks.

The first layer of patching would just generate a lengths+offset+move triple, which can copy the blocks from the original file into a sparse file as first patching operation.

The pattern matching rules could be used on top of that, completing the gaps of the output file.

Originally posted by @RubenKelevra in #2063 (comment)

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions