Skip to content

Add ZstdCompressor #180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 13, 2025
Merged

Add ZstdCompressor #180

merged 3 commits into from
May 13, 2025

Conversation

nhz2
Copy link
Member

@nhz2 nhz2 commented Mar 7, 2025

Alternative to #149

This implementation supports multithreaded compression and decompression, and also supports the checksum option.

ChunkCodecLibZstd is being added as a direct dependency instead of a package extension, because Zarr.jl already depends on zstd through blosc.

One thing to note is that ChunkCodecLibZstd needs Julia 1.11 1.10, and
the ChunkCodec API is still experimental. Any suggestions for improving the API would be helpful.

@coveralls
Copy link

coveralls commented Mar 7, 2025

Pull Request Test Coverage Report for Build 14317383070

Details

  • 5 of 13 (38.46%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.6%) to 85.461%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/Compressors/zstd.jl 5 13 38.46%
Totals Coverage Status
Change from base Build 13680422725: -0.6%
Covered Lines: 917
Relevant Lines: 1073

💛 - Coveralls

@bjarthur
Copy link
Member

bjarthur commented Mar 13, 2025

very much looking forward to both Zstd and the multithreading it brings. are there tests we could add to this PR to ensure it is thread safe?

@nhz2
Copy link
Member Author

nhz2 commented Mar 16, 2025

#181 would add some basic round trip tests, which should cover all the code in this PR.

I'm not sure how to test if this is thread-safe, but in ChunkCodecLibZstd, there is no global state being mutated, and the underlying C library is supposed to be safe to use in multiple threads.

@mkitti
Copy link
Member

mkitti commented Mar 31, 2025

Since this requires Julia 1.11 anyways, could we make this into a package extension and an optional dependency instead of a hard dependency?

The main advantage for the merge strategy here is that we do not make Zarr.jl require Julia 1.11. I would at most be more comfortable making it require Julia 1.10.

@nhz2
Copy link
Member Author

nhz2 commented Mar 31, 2025

I'm happy to accept a PR to ChunkCodecLibZstd.jl to support Julia 1.10. Currently, the only 1.11 feature I am using is the public keyword. But is there a need to install the latest version of an in-development package on an old version of Julia?

@mkitti
Copy link
Member

mkitti commented Mar 31, 2025

If the in-development package is "Zarr.jl", then yes. Julia 1.10 is the current long-term-support release, and I would expect upcoming releases of Zarr.jl to support Julia 1.10 for some time. Making "ChunkCodecLibZstd.jl" a mandatory dependency of Zarr.jl would prevent that. I am less concerned about support for Julia versions prior to Julia 1.10.

For "ChunkCodecLibZstd.jl", dependence on Julia 1.11 is less of an issue as long as it is only an optional dependency of Zarr.jl.

Compat.jl could be used to address the Julia version dependency. However, I still prefer codecs as optional dependencies when possible. If a convenience package, ZarrUniverse.jl for example, is needed that loads Zarr.jl and all optional dependencies, that would not be hard to accomodate.

I will send a pull request.

@bjarthur
Copy link
Member

if it's just public then it's as simple as @compat public foo, bar instead of public foo, bar. unnecessarily restricting version compatibility is a p.i.t.a. please make this change!

@mkitti
Copy link
Member

mkitti commented Apr 1, 2025

PR for using Compat.jl for public: JuliaIO/ChunkCodecs.jl#31
PR for making ChunkCodecLibZstd an optional dependency: #183

@mkitti
Copy link
Member

mkitti commented Apr 1, 2025

I started to test the ZarrUniverse idea here:
https://github.com/mkitti/Zarr.jl/tree/mkitti-zarr-universe/lib/ZarrUniverse

using Pkg
Pkg.add(url="https://github.com/mkitti/Zarr.jl", rev="mkitti-zarr-universe", subdir="lib/ZarrUniverse")

or

] add https://github.com/mkitti/Zarr.jl#mkitti-zarr-universe:lib/ZarrUniverse

@nhz2
Copy link
Member Author

nhz2 commented Apr 7, 2025

I've updated the PR. It should work with Julia 1.10 now. Also, the new decode! function throws a DecodedSizeError if the decoded size is too small or large, which cleans up the error handling.

@meggart meggart marked this pull request as ready for review May 12, 2025 13:41
Copy link
Collaborator

@meggart meggart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, woudl be very interested to test if this works in multithreaded settings, but is defineitely not a merge-stopper since most other compressors break there as well.

@mkitti
Copy link
Member

mkitti commented May 12, 2025

This should be thread safe because it creates a new context for each compression and decompression call.

I understand that @bjarthur has tested these changes under a multithreaded context. It would be great to see a test for this in the test suite.

@mkitti mkitti self-requested a review May 12, 2025 17:14
Copy link
Member

@mkitti mkitti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Minor points:

  1. I am wondering about the need for ChunkCodecCore to be a direct dependency here
  2. Moving codecs out to package extensions can be done earlier. What Fabian discussed was perhaps making Zarr.jl the all encompassing package and then creating a core package that had optional dependencies.

@nhz2 nhz2 merged commit 196c34d into master May 13, 2025
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants