Skip to content

Conversation

@fneddy
Copy link
Contributor

@fneddy fneddy commented Apr 14, 2025

This adds hardware accelerated inflate and deflate for IBMs s390x platform.

This is part of the current ongoing attempt to merge IBMs patches for zlib and get them upstream.
More information about this in #1050 .

If this gets merged #410 may be discarded .

@fneddy fneddy marked this pull request as ready for review April 14, 2025 12:46
@fneddy fneddy mentioned this pull request Mar 24, 2025
10 tasks
@fneddy
Copy link
Contributor Author

fneddy commented Apr 14, 2025

i split up the intrusive code changes into two commits aaafad2 and 36a95f2. I hope reviewing will be easier through this.

@Neustradamus
Copy link

@fneddy: Good job!

@Neustradamus
Copy link

@madler: Have you seen this PR?

@fneddy
Copy link
Contributor Author

fneddy commented Jan 26, 2026

rebased on newest develop branch, read for merge.

CMakeLists.txt Outdated
#
# Add contrib code
#
add_subdirectory(contrib/s390x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've seen the new convenience functions in contrib?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice! give me some minutes I'll rewrite our things to the new functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are included after all targets, so you all the target_link_libraries and this stuff can also go into your dir.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was wondering why all my compiletests where failing. There is a bug in the main CMakeLists.txt:

set(CMAKE_REQUIRED_FLAGS "-WError")

if has to be -Werror

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, strange it worked so far.

Would you mind moving lines 207, 208, 231 and 232 also into contrib/s390x?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

there is also a little more work do be done by me. I need to split the crc and the dfltcc implementation into two distinct subdirectories so they can work independently with the new functions.

@fneddy
Copy link
Contributor Author

fneddy commented Jan 26, 2026

Ok I think this should be it.

@fneddy
Copy link
Contributor Author

fneddy commented Jan 26, 2026

BTW. If build with cmake no -O3 is passed. This results in a drastic slower result with cmake vs configure.

@Vollstrecker
Copy link
Contributor

Sure, because we only rely on the defaults cmake detects. If we make assumptions that don't fit what the user wants it's harder for them to get this out again compared to a user that sets CFLAGS and is done For such stuff things like preferences and toolchain files have been invented.

@madler
Copy link
Owner

madler commented Jan 28, 2026

@fneddy Please take a look at the check failures on Windows. Thanks.

fneddy and others added 3 commits January 28, 2026 15:35
Architecture specific code needs hooks into the current zlib at various
points. This commit adds a a hand-full of macros that platform specific
code can overwrite.
Architecture specific code may need to call this functions. So we
make them non-static and add the symbols to the header.
IBM Z mainframes starting from version z15 provide DFLTCC instruction,
which implements deflate algorithm in hardware with estimated
compression and decompression performance orders of magnitude faster
than the current zlib and ratio comparable with that of level 1.

This patch adds DFLTCC support to zlib. It can be enabled using the
following build commands:

    # via configure
    $ ./configure --dfltcc
    $ make

    # via cmake
    $ cmake -DZLIB_DFLTCC=on ..
    $ make

When built like this, zlib would compress in hardware on level 1, and
in software on all other levels. Decompression will always happen in
hardware. In order to enable DFLTCC compression for levels 1-6 (i.e.,
to make it used by default) one could either configure with
--dfltcc-level-mask=0x7e or export DFLTCC_LEVEL_MASK=0x7e at run
time.

Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page. Therefore care should be taken when using hardware compression
when reproducible results are desired. One such use case - reproducible
software builds - is handled explicitly: when the SOURCE_DATE_EPOCH
environment variable is set, the hardware compression is disabled.

DFLTCC does not support every single zlib feature, in particular:

    * inflate(Z_BLOCK) and inflate(Z_TREES)
    * inflateMark()
    * inflatePrime()
    * inflateSyncPoint()

When used, these functions will either switch to software, or, in case
this is not possible, gracefully fail.

This patch tries to add DFLTCC support in the least intrusive way.
All SystemZ-specific code is placed into separate files, but
unfortunately there is still a noticeable amount of changes in the
main zlib code. Below is the summary of these changes.

DFLTCC takes as arguments a parameter block, an input buffer, an output
buffer and a window. Since DFLTCC requires parameter block to be
doubleword-aligned, and it's reasonable to allocate it alongside
deflate and inflate states, The ZALLOC_STATE(), ZFREE_STATE() and
ZCOPY_STATE() macros are introduced in order to encapsulate the
allocation details. The same is true for window, for which
the ZALLOC_WINDOW() and TRY_FREE_WINDOW() macros are introduced.

Software and hardware window formats do not match, therefore,
deflateSetDictionary(), deflateGetDictionary(),
inflateSetDictionary() and inflateGetDictionary() need special
handling, which is triggered using the new
DEFLATE_SET_DICTIONARY_HOOK(), DEFLATE_GET_DICTIONARY_HOOK(),
INFLATE_SET_DICTIONARY_HOOK() and INFLATE_GET_DICTIONARY_HOOK()
macros.

deflateResetKeep() and inflateResetKeep() now update the DFLTCC
parameter block, which is allocated alongside zlib state, using
the new DEFLATE_RESET_KEEP_HOOK() and INFLATE_RESET_KEEP_HOOK()
macros.

The new DEFLATE_PARAMS_HOOK() macro switches between the hardware
and the software deflate implementations when the deflateParams()
arguments demand this.

The new INFLATE_PRIME_HOOK(), INFLATE_MARK_HOOK() and
INFLATE_SYNC_POINT_HOOK() macros make the respective unsupported
calls gracefully fail.

The algorithm implemented in the hardware has different compression
ratio than the one implemented in software. In order for
deflateBound() to return the correct results for the hardware
implementation, the new DEFLATE_BOUND_ADJUST_COMPLEN() and
DEFLATE_NEED_CONSERVATIVE_BOUND() macros are introduced.

Actual compression and decompression are handled by the new
DEFLATE_HOOK() and INFLATE_TYPEDO_HOOK() macros. Since inflation
with DFLTCC manages the window on its own, calling updatewindow() is
suppressed using the new INFLATE_NEED_UPDATEWINDOW() macro.

In addition to the compression, DFLTCC computes the CRC-32 and Adler-32
checksums, therefore, whenever it's used, the software checksumming is
suppressed using the new DEFLATE_NEED_CHECKSUM() and
INFLATE_NEED_CHECKSUM() macros.

DFLTCC will refuse to write an End-of-block Symbol if there is no input
data, thus in some cases it is necessary to do this manually. In order
to achieve this, send_bits(), bi_reverse(), bi_windup() and
flush_pending() are promoted from local to ZLIB_INTERNAL.
Furthermore, since the block and the stream termination must be handled
in software as well, enum block_state is moved to deflate.h.

Since the first call to dfltcc_inflate() already needs the window,
and it might be not allocated yet, inflate_ensure_window() is
factored out of updatewindow() and made ZLIB_INTERNAL.

Co-authored-by: Eduard Stefes <[email protected]>

try fixing windows build
@fneddy
Copy link
Contributor Author

fneddy commented Jan 29, 2026

done

@madler
Copy link
Owner

madler commented Jan 29, 2026

Thanks! hooks.h does not belong in contrib. (I got rid of another contrib/hooks.h in your CRC code.) Please put it in contrib/dfltcc directory.

@fneddy
Copy link
Contributor Author

fneddy commented Jan 30, 2026

Done. Do you want me to squash the commits?

@madler
Copy link
Owner

madler commented Jan 30, 2026

No need to squash. I might do some when I bring this all in.

However, this isn't ready for that yet. I apologize for requesting more modifications. After looking in more detail at this pull request, I need to impose some requirements:

  1. There can be no dependencies on anything in contrib unless HAVE_S390X_DFLTCC is defined.
  2. There can be no changes to the behavior of inflate or deflate unless HAVE_S390X_DFLTCC is defined.

Example for # 1 is including a header file from contrib. Example for # 2 are changes to deflate in f2e15d6 .

It's ok to build the hooks macros into, say, deflate.h, so long as they don't change any behavior if HAVE_S390X_DFLTCC is not defined. So #ifdef HAVE_S390X_DFLTCC, then #include something from contrib, #else define the hook macros to do nothing, or do what they replaced in the original code. Something like that.

I am reluctant to add external symbols. If I do, I may need to rename them to reduce the chance of collisions. I am not clear on how those particular calls can even solve the stated problem. If you have to manually write an end-of-block symbol, how do you even know what the bits are if the dynamic header was generated internally by the machine instruction? I'd think that you would need to inflate the deflate stream up to that point to know. Also how do you get the machine instruction to spit out the last few bits before where the end-of-block needs to be appended?

I don't get why ZLIB_INTERNAL is nulled out in compress.c and uncompr.c. It's not even used there. In general, this pull request should not mess with how zlib is compiled if HAVE_S390X_DFLTCC is not defined. Not a requirement, but any such impacts on compilation are something I'd have to carefully consider.

Basically what's I'm asking is that this pull request be written to make it obvious that it can do no harm, and that it not depend on anything external if HAVE_S390X_DFLTCC is not defined.

Thanks again for your work! I would like to incorporate this into zlib. I am planning a release soon, but there is no rush, as this can wait for a later release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants