parallel operations workstream

**Is your feature request related to a problem? Please describe.**
Zip files always retain an index located separately from each entry's possibly-compressed data. This allows performing high-level split/merge operations without de/recompressing file contents. This produces improved performance on benchmarks compared to serially iterating over each entry to extract, or serially iterating over each file to compress.

**Describe the solution you'd like**
It's possible to extract zip files in parallel (see #72) as well as merge them to create archives in parallel (see discussion in #73). 

**Describe alternatives you've considered**
While parallel zip extraction as in #72 has likely been implemented elsewhere, to my knowledge the parallel split/merge technique in #73 (researched for pex-tool/pex#2175 and prototyped in https://github.com/cosmicexplorer/medusa-zip) has not been discussed or implemented before in other zip tooling **(please let me know of any prior art for this!)**.

**Additional context**
TODO:
- [ ] refactor reader wrappers to use generic type params in #207 (this gets us `Send` bounds)
- [ ] parallel/pipelined extraction in #208 
- [ ] bulk copy (no de/recompression) with entry renaming as in pex-tool/pex#2175
    - as in that pex change, bulk copy with renaming enables reconstituting a "parent" zip file from an ordered sequence of "child" zips, which may be used to very quickly reconstruct large zip files from immutable cached components.
    - when renaming is *not* required, `ZipWriter::merge_contents()` already works with a single `io::copy()` call. bulk copy with rename avoids de/recompression of file data, but must edit each renamed local file header and therefore requires O(n) `io::copy()` calls.   
- [ ] parallel split/merge for extremely fast creation as in https://github.com/cosmicexplorer/medusa-zip
    - this `zip` crate should probably *not* get into the weeds of crawling the filesystem, which keeps `medusa-zip` useful as a separate crate, and ensures we don't add too much extraneous code to this one.
    - however, the process of merging an ordered sequence of "child" zips with `ZipWriter::merge_contents()` *can* be parallelized, and this is something the `zip` crate should be able to do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

parallel operations workstream #193

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

parallel operations workstream #193

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions