Skip to content

SevenZip / 7z IReader Implementation? #940

@mitchcapper

Description

@mitchcapper

I wanted to iterate over the contents and metadata of some archives and sharp compress has largely worked great for that.

I initially was confused when ReaderFactory.Open(stream) didn't work on 7z.

the official docs do cover this but I missed it.

Initially I had then seen they also mention for perf: to use ExtractAllEntries for 7z and rar. Although it seemed to be primarily for writing the files out it seemed it might be a good way to go for performance and it gave an IReader back. It generally seemed to work well but I noticed on some archives the memory usage was huge, only eventually tracing it to the dictionary size (so a 350MB dictionary may result in a 800MB of memory usage). I also noticed just accessing .Entries on the SevenZipArchive didn't have this penalty. I then realized my error of ExtractAllEntries calling LoadEntries vs GetEntries causing the full dictionary load.

As I wanted to minimize the alternate code path for 7z I eventually settled on a custom IReader implementation that worked well for me but may be some problem outside of just performance for its specific exclusion.

internal class Our7ZReader(SevenZipArchive archive) : IReader {
	public bool MoveToNextEntry() {
		if (!inited) {
			inited = true;
			enumerator = archive.Entries?.GetEnumerator();
		}
		return enumerator.MoveNext();
	}
	public EntryStream OpenEntryStream() => new EntryStream(this, enumerator?.Current.OpenEntryStream());
	public void WriteEntryTo(Stream writableStream) => enumerator?.Current.WriteTo(writableStream);
	private bool inited = false;
	private IEnumerator<SevenZipArchiveEntry> enumerator;
	public ArchiveType ArchiveType => archive.Type;
	public IEntry Entry => enumerator?.Current;
	public bool Cancelled { get; private set; }

	public event EventHandler<ReaderExtractionEventArgs<IEntry>> EntryExtractionProgress;
	public event EventHandler<CompressedBytesReadEventArgs> CompressedBytesRead;
	public event EventHandler<FilePartExtractionBeginEventArgs> FilePartExtractionBegin;
	public void Cancel() => Cancelled = true;

	public void Dispose() {
		enumerator?.Dispose();
		enumerator = null;
	}
}

I then just deviate to add detection code prior to my use of ReaderFactory and then only call ReaderFactory if my IReader reader is not set:

} else if (archivePath.EndsWith(".7z", StringComparison.CurrentCultureIgnoreCase)) {
				if (SharpCompress.Archives.SevenZip.SevenZipArchive.IsSevenZipFile(stream)) {
					stream.Seek(0, SeekOrigin.Begin);
					var archive = SharpCompress.Archives.SevenZip.SevenZipArchive.Open(stream);
					toDispose.Add(archive);
					var ourReader = new Our7ZReader(archive);
					reader = ourReader;
					toDispose.Add(ourReader);
				}
}

Again there is probably a reason IReader was avoided, also above OpenEntryStream is not possible to implement as it is as EntryStream's constructor is internal. I did look at using AbstractReader as it avoided that issue but its constructor is also internal so was moot. For my needs I didn't need that but in theory above may work other than access.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions