Skip to content

[API Proposal] Round-trip PackagePart.CompressionOption, add CompressionLevel.Fast #102127

Open
@edwardneal

Description

@edwardneal

This issue straddles System.IO.Compression and System.IO.Packaging, and is a behavioural regression between .NET and the .NET Framework.

We can use System.IO.Packaging to create a new Package and add PackageParts to it, specifying a CompressionOption on these PackageParts. When we write the Package to a ZIP file via ZipArchive, the new PackagePart is turned into a ZipArchiveEntry and the PackagePart's CompressionOption is mapped to a CompressionLevel. ZipArchiveEntry then passes that through to create the correct deflation stream and to set bits 1 & 2 in the file header's "general purpose bit flags". This behaviour works well.

The regression here comes when we load a Package from an existing stream and start to inspect each PackagePart's CompressionOption. In .NET Framework, Package calls GetCompressionOptionFromZipFileInfo, which maps bits 1 & 2 to a CompressionOption. Similarly, in .NET Package calls GetCompressionOptionFromZipFileInfo. This always returns CompressionOption.Normal, because ZipArchiveEntry doesn't expose its CompressionLevel publicly.

When the issue's closed, I'd like to be able to persist a Package containing a PackagePart with a given CompressionOption, then be able to read that CompressionOption back upon reading it.

There are three parts to this issue:

  1. An API request: I'd like ZipArchiveEntry to provide a read-only CompressionLevel property, and for the CompressionLevel enumeration to include a new member named Fast;
  2. Confirmation on the mapping from System.IO.Compression's CompressionLevel to System.IO.Packaging's CompressionOption;
  3. Discussion on the proposed new CompressionLevel member, and one CompressionOption/CompressionLevel mapping which is ambiguous without it.

API request

This request is primarily for a read-only CompressionLevel property on ZipArchiveEntry. I've chosen not to make it read-write - PackagePart.CompressionOption is read-only, so it's not needed for this specific use case. It might also imply that a ZIP file would be flagged for decompression and recompression (or simply decompressed) by setting it: I'm not sure how ZipArchiveEntry would handle a situation where the entry's header specifies one compression level, but the entry data is compressed with another; there'd definitely be a problem if its compression level was changed to NoCompression!

I've also rolled in a new enumeration member on CompressionLevel to address an ambiguous CompressionOption/CompressionLevel mapping.

namespace System.IO.Compression;

public class ZipArchiveEntry
{
+   public CompressionLevel CompressionLevel => _compressionLevel;
}

public enum CompressionLevel
{
+   Fast = 4
}

CompressionLevel -> CompressionOption mapping

This is different between .NET and .NET Framework, largely because .NET Framework has an extra member in its DeflateOptionEnum which doesn't neatly map to the current .NET equivalent of CompressionLevel at the moment.

Currently, the mapping from CompressionOption to CompressionLevel when writing a PackagePart is:

Specified CompressionOption .NET CompressionLevel .NET Standard CompressionLevel .NET Framework DeflateOptionEnum
NotCompressed NoCompression NoCompression None
Normal Optimal Optimal Normal
Maximum SmallestSize Optimal Maximum
Fast Fastest Fastest Fast
SuperFast Fastest Fastest SuperFast

If the new CompressionLevel enumeration member is approved, I'd change CompressionOption.Fast to map to CompressionLevel.Fast.

.NET Standard has a different mapping for CompressionOption.Maximum to avoid introducing a breaking change for .NET Framework applications referencing System.IO.Packaging.

The mapping from CompressionLevel to CompressionOption when reading a Package needs to roughly correlate with the table above so that values roundtrip properly.

Read CompressionLevel Resultant CompressionOption
Optimal Normal
Fast (if approved) Fast
Fastest SuperFast
NoCompression NotCompressed
SmallestSize Maximum

To continue to avoid a breaking change for .NET Framework applications, .NET Standard would continue to return Optimal in all cases.

New CompressionLevel member, otherwise-ambiguous CompressionOption/CompressionLevel mapping

At present, creating a PackagePart with a CompressionOption of Fast or SuperFast will result in a ZipArchiveEntry with a CompressionLevel of Fastest, setting general purpose bits 1 & 2. This ambiguity means that somebody could create a PackagePart with a CompressionOption of Fast, and it would subsequently be read back with a CompressionOption of Fastest.

This ambiguity arises because the CompressionLevel enumeration doesn't have a member named Fast (or something similarly named, halfway between Optimal and Fastest.) This isn't ideal, so my API request asks for this new member.

Adding a new CompressionLevel also requires selecting the correct ZLib compression level, and while I'd guess that 4 would be a good compromise between Optimal (6) and Fastest (1), I don't have anything quantitative to back that. I'm also not sure what impact the pending switch of ZLib implementation would have on this.

cc @dotnet/area-system-io-compression @carlossanlop

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions