Description
I’ve encountered a significant limitation with the internal SubReadStream class in System.IO.Compression. When calling ZipArchiveEntry.Open() with a ZipArchive in ZipArchiveMode.Read, the returned SubReadStream instance is non-seekable. This behavior presents problems in specific scenarios, particularly when dealing with nested archives
Problem Details
In my use case, I have an HttpStream class that inherits from Stream and uses HttpClient with byte range requests to fetch remote data. My task is to extract an AppxManifest.xml file from a .msix file stored within a .msixbundle file on an HTTP server. Both file types are essentially zip archives.
Currently, because SubReadStream is non-seekable, creating a new ZipArchive from a ZipArchiveEntry within a non-seekable stream results in the entire stream being copied into a new MemoryStream. This is highly inefficient over HTTP and impractical and defeat the whole purpose of using streams, especially when the target file is small (1 KB) and the containing archive is large.
Real-World Impact
Inefficiency
The entire stream must be read into memory, which can be resource-intensive and slow.
Performance
There’s a significant overhead when dealing with large files or archives, especially over network streams, because they must be first downloaded.
Workaround
I was able to bypass this limitation using reflection to access private fields within SubReadStream to determine its bounds. This allowed me to create my own BoundStream that is seekable, then I create ZipArchive from this BoundStream using the SuperStream and the bounds from SubReadStream. Here’s a snippet of the workaround:
private (long, long) ReadSubStreamBounds(Stream subStream)
{
long startInSuperStream;
long endInSuperStream;
Type subReadStreamType = subStream.GetType();
FieldInfo startField = subReadStreamType.GetField("_startInSuperStream", BindingFlags.NonPublic | BindingFlags.Instance);
FieldInfo endField = subReadStreamType.GetField("_endInSuperStream", BindingFlags.NonPublic | BindingFlags.Instance);
if (startField != null && endField != null)
{
startInSuperStream = (long)startField.GetValue(subStream);
endInSuperStream = (long)endField.GetValue(subStream);
return (startInSuperStream, endInSuperStream);
}
throw new Exception("The internal structure of the SubReadStream class has changed, and the expected fields are not present. " +
"This might be due to a change in the Microsoft internal implementation. " +
"Please review and update this code to align with the new internal details. " +
"Using reflection to access private fields is inherently fragile and should be used with caution.");
}
private async Task<Stream> GetAppxManifestStreamFromMsixOrBundleStreamAsync(Stream stream, string filePath, CancellationToken cancellationToken)
{
Exception innerException;
string fileNameWithoutExtension = Path.GetFileNameWithoutExtension(filePath);
string fileNameExtension = Path.GetExtension(filePath);
if (fileNameExtension.Equals(".msix", StringComparison.OrdinalIgnoreCase))
{
return await GetAppxManifestStreamFromMsixStreamAsync(stream, cancellationToken).ConfigureAwait(false);
}
if (fileNameExtension.Equals(".msixbundle", StringComparison.OrdinalIgnoreCase) == false)
{
throw new InvalidOperationException($"Unsupported file type: {fileNameExtension}. Expected 'msix' or 'msixbundle'.");
}
// Construct the expected entry name by changing the extension from the bundle's path
string expectedMsixEntryName = $"{fileNameWithoutExtension}.msix";
string expectedAppxEntryName = $"{fileNameWithoutExtension}.appx";
try
{
using (ZipArchive bundleArchive = new ZipArchive(stream, ZipArchiveMode.Read))
{
// Try to find the MSIX or APPX entry with the expected name
ZipArchiveEntry msixPackageEntry = bundleArchive.GetEntry(expectedMsixEntryName) ?? bundleArchive.GetEntry(expectedAppxEntryName);
if (msixPackageEntry != null)
{
using (Stream msixPackageStream = msixPackageEntry.Open())
{
// Note: msixPackageStream is an instance of an internal sealed class SubReadStream,
// which is conceptually similar to our BoundStream but is not seekable.
// When ZipArchiveEntry encounters non-seekable streams like this, it attempts to copy
// the entire content into a MemoryStream, leading to the unnecessary reading of the entire archive.
// Unfortunately, we don't have direct access to the bounds of the SubReadStream,
// which is why we employ a "Giga Chad" hack using reflection to extract these bounds.
// We then use our own BoundStream, which is seekable, allowing efficient access
// to just the needed part of the stream without reading the entire archive.
(long startInSuperStream, long endInSuperStream) = ReadSubStreamBounds(msixPackageStream);
using (Stream boundStream = new BoundStream(stream, startInSuperStream, endInSuperStream, true))
{
return await GetAppxManifestStreamFromMsixStreamAsync(boundStream, cancellationToken).ConfigureAwait(false);
}
}
}
}
throw new InvalidOperationException($"{expectedMsixEntryName} or {expectedAppxEntryName} not found in the bundle.");
}
catch (Exception ex)
{
innerException = ex;
}
throw new InvalidOperationException($"Failed to read {appxManifestFileName} from the bundle stream", innerException);
}
private async Task<Stream> GetAppxManifestStreamFromMsixStreamAsync(Stream msixStream, CancellationToken cancellationToken)
{
Exception innerException;
try
{
// Open the memory stream as a ZIP archive.
using (ZipArchive archive = new ZipArchive(msixStream, ZipArchiveMode.Read))
{
// Find the AppxManifest.xml entry.
ZipArchiveEntry manifestEntry = archive.GetEntry(appxManifestFileName);
if (manifestEntry != null)
{
using (Stream appxManifestStream = manifestEntry.Open())
{
MemoryStream appxManifestResultStream = new MemoryStream();
appxManifestResultStream.Position = 0;
#if NET45
await appxManifestStream.CopyToAsync(appxManifestResultStream).ConfigureAwait(false);
#else
await appxManifestStream.CopyToAsync(appxManifestResultStream, cancellationToken).ConfigureAwait(false);
#endif
appxManifestResultStream.Position = 0;
return appxManifestResultStream;
}
}
else
{
throw new InvalidOperationException($"{appxManifestFileName} not found in the archive");
}
}
}
catch (Exception ex)
{
innerException = ex;
}
throw new InvalidOperationException($"Failed to read {appxManifestFileName} from MSIX stream", innerException);
}
While this workaround functions, it involves reflection to access private fields, which is inherently fragile and not a long-term solution.
Proposed Solution
I believe SubReadStream should be made seekable. Given its usage, I can’t identify a strong reason for it to remain non-seekable. Implementing seekability would align with the typical expectations of a stream and significantly improve performance and resource usage in scenarios with nested archives.