-
Notifications
You must be signed in to change notification settings - Fork 2.7k
s390x: Add accelerated inflate&deflate #1060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
@fneddy: Good job! |
|
@madler: Have you seen this PR? |
|
rebased on newest develop branch, read for merge. |
CMakeLists.txt
Outdated
| # | ||
| # Add contrib code | ||
| # | ||
| add_subdirectory(contrib/s390x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've seen the new convenience functions in contrib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 nice! give me some minutes I'll rewrite our things to the new functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are included after all targets, so you all the target_link_libraries and this stuff can also go into your dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was wondering why all my compiletests where failing. There is a bug in the main CMakeLists.txt:
Line 106 in 2209f63
| set(CMAKE_REQUIRED_FLAGS "-WError") |
if has to be -Werror
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, strange it worked so far.
Would you mind moving lines 207, 208, 231 and 232 also into contrib/s390x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
there is also a little more work do be done by me. I need to split the crc and the dfltcc implementation into two distinct subdirectories so they can work independently with the new functions.
|
Ok I think this should be it. |
|
BTW. If build with cmake no -O3 is passed. This results in a drastic slower result with cmake vs configure. |
|
Sure, because we only rely on the defaults cmake detects. If we make assumptions that don't fit what the user wants it's harder for them to get this out again compared to a user that sets CFLAGS and is done For such stuff things like preferences and toolchain files have been invented. |
|
@fneddy Please take a look at the check failures on Windows. Thanks. |
Architecture specific code needs hooks into the current zlib at various points. This commit adds a a hand-full of macros that platform specific code can overwrite.
Architecture specific code may need to call this functions. So we make them non-static and add the symbols to the header.
IBM Z mainframes starting from version z15 provide DFLTCC instruction,
which implements deflate algorithm in hardware with estimated
compression and decompression performance orders of magnitude faster
than the current zlib and ratio comparable with that of level 1.
This patch adds DFLTCC support to zlib. It can be enabled using the
following build commands:
# via configure
$ ./configure --dfltcc
$ make
# via cmake
$ cmake -DZLIB_DFLTCC=on ..
$ make
When built like this, zlib would compress in hardware on level 1, and
in software on all other levels. Decompression will always happen in
hardware. In order to enable DFLTCC compression for levels 1-6 (i.e.,
to make it used by default) one could either configure with
--dfltcc-level-mask=0x7e or export DFLTCC_LEVEL_MASK=0x7e at run
time.
Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page. Therefore care should be taken when using hardware compression
when reproducible results are desired. One such use case - reproducible
software builds - is handled explicitly: when the SOURCE_DATE_EPOCH
environment variable is set, the hardware compression is disabled.
DFLTCC does not support every single zlib feature, in particular:
* inflate(Z_BLOCK) and inflate(Z_TREES)
* inflateMark()
* inflatePrime()
* inflateSyncPoint()
When used, these functions will either switch to software, or, in case
this is not possible, gracefully fail.
This patch tries to add DFLTCC support in the least intrusive way.
All SystemZ-specific code is placed into separate files, but
unfortunately there is still a noticeable amount of changes in the
main zlib code. Below is the summary of these changes.
DFLTCC takes as arguments a parameter block, an input buffer, an output
buffer and a window. Since DFLTCC requires parameter block to be
doubleword-aligned, and it's reasonable to allocate it alongside
deflate and inflate states, The ZALLOC_STATE(), ZFREE_STATE() and
ZCOPY_STATE() macros are introduced in order to encapsulate the
allocation details. The same is true for window, for which
the ZALLOC_WINDOW() and TRY_FREE_WINDOW() macros are introduced.
Software and hardware window formats do not match, therefore,
deflateSetDictionary(), deflateGetDictionary(),
inflateSetDictionary() and inflateGetDictionary() need special
handling, which is triggered using the new
DEFLATE_SET_DICTIONARY_HOOK(), DEFLATE_GET_DICTIONARY_HOOK(),
INFLATE_SET_DICTIONARY_HOOK() and INFLATE_GET_DICTIONARY_HOOK()
macros.
deflateResetKeep() and inflateResetKeep() now update the DFLTCC
parameter block, which is allocated alongside zlib state, using
the new DEFLATE_RESET_KEEP_HOOK() and INFLATE_RESET_KEEP_HOOK()
macros.
The new DEFLATE_PARAMS_HOOK() macro switches between the hardware
and the software deflate implementations when the deflateParams()
arguments demand this.
The new INFLATE_PRIME_HOOK(), INFLATE_MARK_HOOK() and
INFLATE_SYNC_POINT_HOOK() macros make the respective unsupported
calls gracefully fail.
The algorithm implemented in the hardware has different compression
ratio than the one implemented in software. In order for
deflateBound() to return the correct results for the hardware
implementation, the new DEFLATE_BOUND_ADJUST_COMPLEN() and
DEFLATE_NEED_CONSERVATIVE_BOUND() macros are introduced.
Actual compression and decompression are handled by the new
DEFLATE_HOOK() and INFLATE_TYPEDO_HOOK() macros. Since inflation
with DFLTCC manages the window on its own, calling updatewindow() is
suppressed using the new INFLATE_NEED_UPDATEWINDOW() macro.
In addition to the compression, DFLTCC computes the CRC-32 and Adler-32
checksums, therefore, whenever it's used, the software checksumming is
suppressed using the new DEFLATE_NEED_CHECKSUM() and
INFLATE_NEED_CHECKSUM() macros.
DFLTCC will refuse to write an End-of-block Symbol if there is no input
data, thus in some cases it is necessary to do this manually. In order
to achieve this, send_bits(), bi_reverse(), bi_windup() and
flush_pending() are promoted from local to ZLIB_INTERNAL.
Furthermore, since the block and the stream termination must be handled
in software as well, enum block_state is moved to deflate.h.
Since the first call to dfltcc_inflate() already needs the window,
and it might be not allocated yet, inflate_ensure_window() is
factored out of updatewindow() and made ZLIB_INTERNAL.
Co-authored-by: Eduard Stefes <[email protected]>
try fixing windows build
|
done |
|
Thanks! hooks.h does not belong in contrib. (I got rid of another contrib/hooks.h in your CRC code.) Please put it in contrib/dfltcc directory. |
|
Done. Do you want me to squash the commits? |
|
No need to squash. I might do some when I bring this all in. However, this isn't ready for that yet. I apologize for requesting more modifications. After looking in more detail at this pull request, I need to impose some requirements:
Example for # 1 is including a header file from contrib. Example for # 2 are changes to deflate in f2e15d6 . It's ok to build the hooks macros into, say, deflate.h, so long as they don't change any behavior if HAVE_S390X_DFLTCC is not defined. So #ifdef HAVE_S390X_DFLTCC, then #include something from contrib, #else define the hook macros to do nothing, or do what they replaced in the original code. Something like that. I am reluctant to add external symbols. If I do, I may need to rename them to reduce the chance of collisions. I am not clear on how those particular calls can even solve the stated problem. If you have to manually write an end-of-block symbol, how do you even know what the bits are if the dynamic header was generated internally by the machine instruction? I'd think that you would need to inflate the deflate stream up to that point to know. Also how do you get the machine instruction to spit out the last few bits before where the end-of-block needs to be appended? I don't get why ZLIB_INTERNAL is nulled out in compress.c and uncompr.c. It's not even used there. In general, this pull request should not mess with how zlib is compiled if HAVE_S390X_DFLTCC is not defined. Not a requirement, but any such impacts on compilation are something I'd have to carefully consider. Basically what's I'm asking is that this pull request be written to make it obvious that it can do no harm, and that it not depend on anything external if HAVE_S390X_DFLTCC is not defined. Thanks again for your work! I would like to incorporate this into zlib. I am planning a release soon, but there is no rush, as this can wait for a later release. |
This adds hardware accelerated inflate and deflate for IBMs s390x platform.
This is part of the current ongoing attempt to merge IBMs patches for zlib and get them upstream.
More information about this in #1050 .
If this gets merged #410 may be discarded .