Description
This issue straddles System.IO.Compression and System.IO.Packaging, and is a behavioural regression between .NET and the .NET Framework.
We can use System.IO.Packaging to create a new Package
and add PackagePart
s to it, specifying a CompressionOption
on these PackageParts. When we write the Package to a ZIP file via ZipArchive
, the new PackagePart is turned into a ZipArchiveEntry
and the PackagePart's CompressionOption is mapped to a CompressionLevel
. ZipArchiveEntry then passes that through to create the correct deflation stream and to set bits 1 & 2 in the file header's "general purpose bit flags". This behaviour works well.
The regression here comes when we load a Package from an existing stream and start to inspect each PackagePart's CompressionOption. In .NET Framework, Package calls GetCompressionOptionFromZipFileInfo
, which maps bits 1 & 2 to a CompressionOption. Similarly, in .NET Package calls GetCompressionOptionFromZipFileInfo
. This always returns CompressionOption.Normal
, because ZipArchiveEntry doesn't expose its CompressionLevel publicly.
When the issue's closed, I'd like to be able to persist a Package containing a PackagePart with a given CompressionOption, then be able to read that CompressionOption back upon reading it.
There are three parts to this issue:
- An API request: I'd like ZipArchiveEntry to provide a read-only
CompressionLevel
property, and for the CompressionLevel enumeration to include a new member namedFast
; - Confirmation on the mapping from System.IO.Compression's CompressionLevel to System.IO.Packaging's CompressionOption;
- Discussion on the proposed new CompressionLevel member, and one CompressionOption/CompressionLevel mapping which is ambiguous without it.
API request
This request is primarily for a read-only CompressionLevel property on ZipArchiveEntry. I've chosen not to make it read-write - PackagePart.CompressionOption
is read-only, so it's not needed for this specific use case. It might also imply that a ZIP file would be flagged for decompression and recompression (or simply decompressed) by setting it: I'm not sure how ZipArchiveEntry would handle a situation where the entry's header specifies one compression level, but the entry data is compressed with another; there'd definitely be a problem if its compression level was changed to NoCompression!
I've also rolled in a new enumeration member on CompressionLevel to address an ambiguous CompressionOption/CompressionLevel mapping.
namespace System.IO.Compression;
public class ZipArchiveEntry
{
+ public CompressionLevel CompressionLevel => _compressionLevel;
}
public enum CompressionLevel
{
+ Fast = 4
}
CompressionLevel -> CompressionOption mapping
This is different between .NET and .NET Framework, largely because .NET Framework has an extra member in its DeflateOptionEnum which doesn't neatly map to the current .NET equivalent of CompressionLevel at the moment.
Currently, the mapping from CompressionOption to CompressionLevel when writing a PackagePart is:
Specified CompressionOption | .NET CompressionLevel | .NET Standard CompressionLevel | .NET Framework DeflateOptionEnum |
---|---|---|---|
NotCompressed | NoCompression | NoCompression | None |
Normal | Optimal | Optimal | Normal |
Maximum | SmallestSize | Optimal | Maximum |
Fast | Fastest | Fastest | Fast |
SuperFast | Fastest | Fastest | SuperFast |
If the new CompressionLevel enumeration member is approved, I'd change CompressionOption.Fast to map to CompressionLevel.Fast.
.NET Standard has a different mapping for CompressionOption.Maximum to avoid introducing a breaking change for .NET Framework applications referencing System.IO.Packaging.
The mapping from CompressionLevel to CompressionOption when reading a Package needs to roughly correlate with the table above so that values roundtrip properly.
Read CompressionLevel | Resultant CompressionOption |
---|---|
Optimal | Normal |
Fast (if approved) | Fast |
Fastest | SuperFast |
NoCompression | NotCompressed |
SmallestSize | Maximum |
To continue to avoid a breaking change for .NET Framework applications, .NET Standard would continue to return Optimal in all cases.
New CompressionLevel member, otherwise-ambiguous CompressionOption/CompressionLevel mapping
At present, creating a PackagePart with a CompressionOption of Fast or SuperFast will result in a ZipArchiveEntry with a CompressionLevel of Fastest, setting general purpose bits 1 & 2. This ambiguity means that somebody could create a PackagePart with a CompressionOption of Fast, and it would subsequently be read back with a CompressionOption of Fastest.
This ambiguity arises because the CompressionLevel enumeration doesn't have a member named Fast (or something similarly named, halfway between Optimal and Fastest.) This isn't ideal, so my API request asks for this new member.
Adding a new CompressionLevel also requires selecting the correct ZLib compression level, and while I'd guess that 4 would be a good compromise between Optimal (6) and Fastest (1), I don't have anything quantitative to back that. I'm also not sure what impact the pending switch of ZLib implementation would have on this.
cc @dotnet/area-system-io-compression @carlossanlop