Get unused data from end of DecompressionStream?

I'm not very familiar with streams and compression, but hopefully this is understandable.

For `deflate`, the spec states "It is an error if there is additional input data after the ADLER32 checksum."
For `gzip`, the spec says "It is an error if there is additional input data after the end of the "member"."

As expected, Chrome's current implimentation throws a TypeError ("Junk found after end of compressed data.") when extra data is written to a DecompressionStream.

This error can be caught and ignored, but there doesn't seem to be a way of retrieving the already-written-but-not-used "junk" data. There seems to be an assumption here that developers already know the length of the compressed data, and can provide exactly that data and nothing more. On the contrary, this "junk" data can be very important in cases where the compressed data is embedded in another stream and you don't know the length of the compressed data.

A good example of this is Git's PackFile format, which only tells you the size of the uncompressed data, not the compressed size. In such a case you must rely on the decompressor to tell you when it's done decompressing data, and then handle the remaining data. 

My attempt at putting together an example:
```js
// A stream with two compressed items
// deflate("Hello World") + deflate("FooBarBaz")
const data = new Uint8Array([
    0x78, 0x9c, 0xf3, 0x48, 0xcd, 0xc9, 0xc9, 0x57, 0x08, 0xcf, 0x2f, 0xca, 0x49, 0x01, 0x00, 0x18, 0x0b, 0x04, 0x1d,
    0x78, 0x9c, 0x73, 0xcb, 0xcf, 0x77, 0x4a, 0x2c, 0x72, 0x4a, 0xac, 0x02, 0x00, 0x10, 0x3b, 0x03, 0x57,
]);

// Decompress the first item
const item1Stream = new DecompressionStream('deflate');
item1Stream.writable.getWriter().write(data).catch(() => { /* Rejects with a TypeError: Junk found after end of compressed data. */ });
console.log(await item1Stream.readable.getReader().read()); // "Hello World"

// How do I get the remaining data (the "junk") in order to decompress the second item?
// I've already written it to the previous stream, and there's nothing to tell me how much was used or what's left over.
const item2Stream = new DecompressionStream('deflate');
item2Stream.writable.getWriter().write(getRemainingDataSomehow());
console.log(await item2Stream.readable.getReader().read()); // "FooBarBaz"
```

Now, as a workaround, I could write the data to my first stream one byte at a time, saving the most recently written byte and carrying it over when the writer throws that specific exception - But writing one byte at a time feels very inefficient and adds a lot of complexity, and checking for that specific error message seems fragile (it might chage, and other implimentations might use a different message.)

Zlib itself provides a way to know what bytes weren't used (though I don't know any details about how.)
Python's zlib api provides an [`unused_data`](https://docs.python.org/3/library/zlib.html#zlib.Decompress.unused_data) property that contains the unused bytes.
Node's zlib api provides a [`bytesWritten`](https://nodejs.org/api/zlib.html#zlib_zlib_byteswritten) property that can be used to calculate the unused data.
It would be great to have something similar available in the DecompressionStream api.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Get unused data from end of DecompressionStream? #39

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Get unused data from end of DecompressionStream? #39

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions