-
-
Notifications
You must be signed in to change notification settings - Fork 730
Avoid deep copy on lz4 decompression #7437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 22 files ± 0 22 suites ±0 9h 57m 15s ⏱️ - 23m 5s For more details on these failures, see this check. Results for commit 9b14c35. ± Comparison against base commit f3995b5. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle this seems fine. I did have a couple of small questions though.
x = np.arange(1000000, dtype="int64") | ||
compression, payload = maybe_compress(x.data) | ||
assert compression == "lz4" | ||
assert compression in {"lz4", "snappy", "zstd", "zlib"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would be sad if we used zlib
by default in any configuration. I'll bet that it's faster to just send data uncompressed over the network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right. I misread the code; default compression is lz4 -> snappy -> None.
I've amended the tests and added a specific test for the priority order.
@@ -217,7 +217,6 @@ def test_itemsize(dt, size): | |||
|
|||
|
|||
def test_compress_numpy(): | |||
pytest.importorskip("lz4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, why this change? If we didn't have lz4, snappy, or zstandard installed (all of which are optional I think) then I'd expect this to fail.
The only compressor we have by default, I think, is zlib, and we don't compress with that by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if you have snappy but not lz4 it will succeed.
zstandard does not install itself as a default compressor.
Amended the tests to reflect this.
Speed up deserialization when
a. lz4 is installed, and
b. the buffer is compressible, and
c. the buffer is smaller than 64 MiB (
distributed.comm.shard
)Note that the default chunk size in dask.array is 128 MiB.
Note that this does not prevent a memory flare, as there's an unnecessary deep copy upstream as well:
https://github.com/python-lz4/python-lz4/blob/79370987909663d4e6ef743762768ebf970a2383/lz4/block/_block.c#L256