Skip to content

Commit f2e15d6

Browse files
iii-ifneddy
authored andcommitted
Add support for IBM Z hardware-accelerated deflate
IBM Z mainframes starting from version z15 provide DFLTCC instruction, which implements deflate algorithm in hardware with estimated compression and decompression performance orders of magnitude faster than the current zlib and ratio comparable with that of level 1. This patch adds DFLTCC support to zlib. It can be enabled using the following build commands: # via configure $ ./configure --dfltcc $ make # via cmake $ cmake -DZLIB_DFLTCC=on .. $ make When built like this, zlib would compress in hardware on level 1, and in software on all other levels. Decompression will always happen in hardware. In order to enable DFLTCC compression for levels 1-6 (i.e., to make it used by default) one could either configure with --dfltcc-level-mask=0x7e or export DFLTCC_LEVEL_MASK=0x7e at run time. Two DFLTCC compression calls produce the same results only when they both are made on machines of the same generation, and when the respective buffers have the same offset relative to the start of the page. Therefore care should be taken when using hardware compression when reproducible results are desired. One such use case - reproducible software builds - is handled explicitly: when the SOURCE_DATE_EPOCH environment variable is set, the hardware compression is disabled. DFLTCC does not support every single zlib feature, in particular: * inflate(Z_BLOCK) and inflate(Z_TREES) * inflateMark() * inflatePrime() * inflateSyncPoint() When used, these functions will either switch to software, or, in case this is not possible, gracefully fail. This patch tries to add DFLTCC support in the least intrusive way. All SystemZ-specific code is placed into separate files, but unfortunately there is still a noticeable amount of changes in the main zlib code. Below is the summary of these changes. DFLTCC takes as arguments a parameter block, an input buffer, an output buffer and a window. Since DFLTCC requires parameter block to be doubleword-aligned, and it's reasonable to allocate it alongside deflate and inflate states, The ZALLOC_STATE(), ZFREE_STATE() and ZCOPY_STATE() macros are introduced in order to encapsulate the allocation details. The same is true for window, for which the ZALLOC_WINDOW() and TRY_FREE_WINDOW() macros are introduced. Software and hardware window formats do not match, therefore, deflateSetDictionary(), deflateGetDictionary(), inflateSetDictionary() and inflateGetDictionary() need special handling, which is triggered using the new DEFLATE_SET_DICTIONARY_HOOK(), DEFLATE_GET_DICTIONARY_HOOK(), INFLATE_SET_DICTIONARY_HOOK() and INFLATE_GET_DICTIONARY_HOOK() macros. deflateResetKeep() and inflateResetKeep() now update the DFLTCC parameter block, which is allocated alongside zlib state, using the new DEFLATE_RESET_KEEP_HOOK() and INFLATE_RESET_KEEP_HOOK() macros. The new DEFLATE_PARAMS_HOOK() macro switches between the hardware and the software deflate implementations when the deflateParams() arguments demand this. The new INFLATE_PRIME_HOOK(), INFLATE_MARK_HOOK() and INFLATE_SYNC_POINT_HOOK() macros make the respective unsupported calls gracefully fail. The algorithm implemented in the hardware has different compression ratio than the one implemented in software. In order for deflateBound() to return the correct results for the hardware implementation, the new DEFLATE_BOUND_ADJUST_COMPLEN() and DEFLATE_NEED_CONSERVATIVE_BOUND() macros are introduced. Actual compression and decompression are handled by the new DEFLATE_HOOK() and INFLATE_TYPEDO_HOOK() macros. Since inflation with DFLTCC manages the window on its own, calling updatewindow() is suppressed using the new INFLATE_NEED_UPDATEWINDOW() macro. In addition to the compression, DFLTCC computes the CRC-32 and Adler-32 checksums, therefore, whenever it's used, the software checksumming is suppressed using the new DEFLATE_NEED_CHECKSUM() and INFLATE_NEED_CHECKSUM() macros. DFLTCC will refuse to write an End-of-block Symbol if there is no input data, thus in some cases it is necessary to do this manually. In order to achieve this, send_bits(), bi_reverse(), bi_windup() and flush_pending() are promoted from local to ZLIB_INTERNAL. Furthermore, since the block and the stream termination must be handled in software as well, enum block_state is moved to deflate.h. Since the first call to dfltcc_inflate() already needs the window, and it might be not allocated yet, inflate_ensure_window() is factored out of updatewindow() and made ZLIB_INTERNAL. Co-authored-by: Eduard Stefes <[email protected]> try fixing windows build
1 parent 35033b2 commit f2e15d6

File tree

14 files changed

+1446
-8
lines changed

14 files changed

+1446
-8
lines changed

Makefile.in

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,9 @@ crc32.o: $(SRCDIR)crc32.c
168168
crc32_vx.o: $(SRCDIR)contrib/crc32vx/crc32_vx.c
169169
$(CC) $(CFLAGS) $(VGFMAFLAG) $(ZINC) -c -o $@ $(SRCDIR)contrib/crc32vx/crc32_vx.c
170170

171+
dfltcc.o: $(SRCDIR)contrib/dfltcc/dfltcc.c
172+
$(CC) $(CFLAGS) $(ZINC) -c -o $@ $(SRCDIR)contrib/dfltcc/dfltcc.c
173+
171174
deflate.o: $(SRCDIR)deflate.c
172175
$(CC) $(CFLAGS) $(ZINC) -c -o $@ $(SRCDIR)deflate.c
173176

@@ -223,6 +226,11 @@ crc32_vx.lo: $(SRCDIR)contrib/crc32vx/crc32_vx.c
223226
$(CC) $(SFLAGS) $(VGFMAFLAG) $(ZINC) -DPIC -c -o objs/crc32_vx.o $(SRCDIR)contrib/crc32vx/crc32_vx.c
224227
-@mv objs/crc32_vx.o $@
225228

229+
dfltcc.lo: $(SRCDIR)contrib/dfltcc/dfltcc.c
230+
-@mkdir objs 2>/dev/null || test -d objs
231+
$(CC) $(SFLAGS) $(ZINC) -DPIC -c -o objs/dfltcc.o $(SRCDIR)contrib/dfltcc/dfltcc.c
232+
-@mv objs/dfltcc.o $@
233+
226234
deflate.lo: $(SRCDIR)deflate.c
227235
-@mkdir objs 2>/dev/null || test -d objs
228236
$(CC) $(SFLAGS) $(ZINC) -DPIC -c -o objs/deflate.o $(SRCDIR)deflate.c
@@ -296,6 +304,9 @@ placebo $(SHAREDLIBV): $(PIC_OBJS) libz.a
296304
ln -s $@ $(SHAREDLIBM)
297305
-@rmdir objs
298306

307+
crc32_test$(EXE): crc32_test.o $(STATICLIB)
308+
$(CC) $(CFLAGS) -o $@ crc32_test.o $(TEST_LIBS)
309+
299310
example$(EXE): example.o $(STATICLIB)
300311
$(CC) $(CFLAGS) $(LDFLAGS) -o $@ example.o $(TEST_LIBS)
301312

@@ -308,6 +319,9 @@ examplesh$(EXE): example.o $(SHAREDLIBV)
308319
minigzipsh$(EXE): minigzip.o $(SHAREDLIBV)
309320
$(CC) $(CFLAGS) -o $@ minigzip.o $(LDFLAGS) -L. $(SHAREDLIBV)
310321

322+
crc32_test64$(EXE): crc32_test64.o $(STATICLIB)
323+
$(CC) $(CFLAGS) -o $@ crc32_test64.o $(TEST_LIBS)
324+
311325
example64$(EXE): example64.o $(STATICLIB)
312326
$(CC) $(CFLAGS) $(LDFLAGS) -o $@ example64.o $(TEST_LIBS)
313327

@@ -416,6 +430,7 @@ inffast.o: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)inftrees.h $(SRCDIR
416430
inftrees.o: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)inftrees.h
417431
trees.o: $(SRCDIR)deflate.h $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)trees.h
418432
crc32_vx.o: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)contrib/crc32vx/crc32_vx_hooks.h
433+
dfltcc.o: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)contrib/dfltcc/dfltcc_hooks.h $(SRCDIR)contrib/dfltcc/dfltcc.h
419434

420435
adler32.lo: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h
421436
zutil.lo: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)gzguts.h
@@ -427,4 +442,5 @@ infback.lo inflate.lo: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)inftree
427442
inffast.lo: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)inftrees.h $(SRCDIR)inflate.h $(SRCDIR)inffast.h
428443
inftrees.lo: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)inftrees.h
429444
trees.lo: $(SRCDIR)deflate.h $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)trees.h
430-
crc32_vx.lo: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)contrib/crc32vx/crc32_vx_hooks.h
445+
crc32_vx.lo: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)contrib/crc32vx/crc32_vx_hooks.h
446+
dfltcc.o: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)contrib/dfltcc/dfltcc_hooks.h $(SRCDIR)contrib/dfltcc/dfltcc.h

compress.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,12 @@
55

66
/* @(#) $Id$ */
77

8+
#define ZLIB_INTERNAL
89
#include "zlib.h"
910
#include "contrib/hooks.h"
1011

12+
#define ZLIB_WRAPLEN 6 /* zlib format overhead */
13+
1114
/* ===========================================================================
1215
Compresses the source buffer into the destination buffer. The level
1316
parameter has the same meaning as in deflateInit. sourceLen is the byte

configure

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ undefined=0
9696
insecure=0
9797
unknown=0
9898
enable_crcvx=1
99+
enable_dfltcc=0
99100
old_cc="$CC"
100101
old_cflags="$CFLAGS"
101102
OBJC='$(OBJZ) $(OBJG)'
@@ -123,7 +124,7 @@ case "$1" in
123124
echo ' configure [--const] [--zprefix] [--prefix=PREFIX] [--eprefix=EXPREFIX]' | tee -a configure.log
124125
echo ' [--insecure] [--static] [--64] [--libdir=LIBDIR] [--sharedlibdir=LIBDIR]' | tee -a configure.log
125126
echo ' [--includedir=INCLUDEDIR] [--archs="-arch i386 -arch x86_64"]' | tee -a configure.log
126-
echo ' [--disable-crcvx]' | tee -a configure.log
127+
echo ' [--disable-crcvx] [--dfltcc] [--dfltcc-level-mask=MASK]' | tee -a configure.log
127128
exit 0 ;;
128129
-p*=* | --prefix=*) prefix=`echo $1 | sed 's/.*=//'`; shift ;;
129130
-e*=* | --eprefix=*) exec_prefix=`echo $1 | sed 's/.*=//'`; shift ;;
@@ -153,6 +154,10 @@ case "$1" in
153154
--undefined) undefined=1; shift ;;
154155
--insecure) insecure=1; shift ;;
155156
--disable-crcvx) enable_crcvx=0; shift ;;
157+
--dfltcc)enable_dfltcc=1; shift ;;
158+
--dfltcc-level-mask=*)
159+
CFLAGS="$CFLAGS -DDFLTCC_LEVEL_MASK=`echo $1 | sed 's/.*=//'`"
160+
shift ;;
156161
*) unknown=1; echo "unknown option ignored: $1" | tee -a configure.log; shift;;
157162
esac
158163
done
@@ -955,6 +960,17 @@ EOF
955960
fi
956961
fi
957962

963+
# enable ibm s390x dfltcc extension
964+
HAVE_S390X_DFLTCC=0
965+
if test $HAVE_S390X -eq 1 && test $enable_dfltcc -eq 1; then
966+
HAVE_S390X_DFLTCC=1
967+
echo "Enabeling s390x dfltcc extension ... Yes." | tee -a configure.log
968+
CFLAGS="$CFLAGS -DHAVE_S390X_DFLTCC"
969+
SFLAGS="$SFLAGS -DHAVE_S390X_DFLTCC"
970+
OBJC="$OBJC dfltcc.o"
971+
PIC_OBJC="$PIC_OBJC dfltcc.lo"
972+
fi
973+
958974
# show the results in the log
959975
echo >> configure.log
960976
echo ALL = $ALL >> configure.log
@@ -988,6 +1004,7 @@ echo sharedlibdir = $sharedlibdir >> configure.log
9881004
echo uname = $uname >> configure.log
9891005
echo HAVE_S390X = $HAVE_S390X >> configure.log
9901006
echo HAVE_S390X_VX = $HAVE_S390X_VX >> configure.log
1007+
echo HAVE_S390X_DFLTCC = $HAVE_S390X_DFLTCC >> configure.log
9911008
echo VGFMAFLAG = $VGFMAFLAG >> configure.log
9921009

9931010
# update Makefile with the configure results
@@ -1000,6 +1017,7 @@ sed < ${SRCDIR}Makefile.in "
10001017
/^LDFLAGS *=/s#=.*#=$LDFLAGS#
10011018
/^LDSHARED *=/s#=.*#=$LDSHARED#
10021019
/^CPP *=/s#=.*#=$CPP#
1020+
/^VGFMAFLAG *=/s#=.*#=$VGFMAFLAG#
10031021
/^STATICLIB *=/s#=.*#=$STATICLIB#
10041022
/^SHAREDLIB *=/s#=.*#=$SHAREDLIB#
10051023
/^SHAREDLIBV *=/s#=.*#=$SHAREDLIBV#

contrib/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ zlib_add_contrib_feature("GVMAT64"
4545

4646
zlib_add_contrib_feature(INFBACK9 "with support for method 9 deflate" infback9)
4747
zlib_add_contrib_feature(CRC32VX "with S390X-CRC32VX implementation" crc32vx On)
48+
zlib_add_contrib_feature(DFLTCC "with S390X-DFLTCC deflate acceleration" dfltcc)
4849
zlib_add_contrib_lib(ADA "Ada bindings" ada)
4950
zlib_add_contrib_lib(BLAST "blast binary" blast)
5051
zlib_add_contrib_lib(IOSTREAM3 "IOStream C++ bindings V3" iostream3)

contrib/README.contrib

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,10 @@ puff/ by Mark Adler <[email protected]>
4949
crc32vx/ by Ilya Leoshkevich <[email protected]>
5050
Hardware-accelerated CRC32 on IBM Z with Z13 VX extension.
5151

52+
dfltcc/ by Ilya Leoshkevich <[email protected]>
53+
Hardware-accelerated deflate on IBM Z with Z15 DEFLATE CONVERSION CALL
54+
instruction.
55+
5256
testzlib/ by Gilles Vollant <[email protected]>
5357
Example of the use of zlib
5458

contrib/dfltcc/CMakeLists.txt

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# check if we compile for IBM s390x
2+
#
3+
CHECK_C_SOURCE_COMPILES("
4+
#ifndef __s390x__
5+
#error
6+
#endif
7+
int main() {return 0;}
8+
" HAS_S390X_SUPPORT)
9+
10+
#
11+
# Check for IBM S390X - DFLTCC extensions
12+
#
13+
if(ZLIB_WITH_DFLTCC AND HAS_S390X_SUPPORT)
14+
# check if we have static_assert
15+
check_c_source_compiles("
16+
#include <assert.h>
17+
static_assert(1==1,\"true\");
18+
" HAS_STATIC_ASSERT)
19+
20+
# check if we have secure_getenv
21+
set(CMAKE_REQUIRED_FLAGS -D_GNU_SOURCE=1)
22+
check_c_source_compiles("
23+
#include <stdlib.h>
24+
int main() { char* _foo = secure_getenv(\"PWD\");return 0; }
25+
" HAS_SECURE_GETENV)
26+
unset(CMAKE_REQUIRED_FLAGS)
27+
28+
# check for specific headers
29+
check_include_file(sys/sdt.h HAS_SYS_SDT_H)
30+
31+
set(definitions "-DHAVE_S390X_DFLTCC=1")
32+
if(HAS_STATIC_ASSERT)
33+
list(APPEND definitions "-DHAVE_STATIC_ASSERT=1")
34+
endif()
35+
if(HAS_SECURE_GETENV)
36+
list(APPEND definitions "-DHAVE_SECURE_GETENV=1")
37+
list(APPEND definitions "-D_GNU_SOURCE=1")
38+
endif()
39+
if(HAS_SYS_SDT_H)
40+
list(APPEND definitions "-DHAVE_SYS_SDT_H=1")
41+
endif()
42+
43+
# prepare compiling for s390x
44+
if(ZLIB_BUILD_SHARED)
45+
target_sources(zlib
46+
PRIVATE
47+
dfltcc.c
48+
dfltcc_hooks.h
49+
dfltcc_common.h)
50+
target_compile_definitions(zlib PUBLIC ${definitions})
51+
endif(ZLIB_BUILD_SHARED)
52+
if(ZLIB_BUILD_STATIC)
53+
target_sources(zlibstatic
54+
PRIVATE
55+
dfltcc.c
56+
dfltcc_hooks.h
57+
dfltcc_common.h)
58+
target_compile_definitions(zlibstatic PUBLIC ${definitions})
59+
endif(ZLIB_BUILD_STATIC)
60+
61+
endif(ZLIB_WITH_DFLTCC AND HAS_S390X_SUPPORT)

contrib/dfltcc/README

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
IBM Z mainframes starting from version z13 provide vector instructions, which
2+
allows vectorization of crc32. This extension is build by default when targeting
3+
ibm s390x. However this extension can disabled if desired:
4+
5+
# for configure build
6+
$ ./configure --disable-crcvx
7+
8+
# for cmake build
9+
$ cmake .. -DZLIB_CRC32VX=off
10+
11+
12+
IBM Z mainframes starting from version z15 provide DFLTCC instruction,
13+
which implements deflate algorithm in hardware with estimated
14+
compression and decompression performance orders of magnitude faster
15+
than the current zlib and ratio comparable with that of level 1.
16+
17+
This directory adds DFLTCC support. In order to enable it, the following
18+
build commands should be used:
19+
20+
$ ./configure --dfltcc
21+
$ make
22+
23+
When built like this, zlib would compress in hardware on level 1, and in
24+
software on all other levels. Decompression will always happen in
25+
hardware. In order to enable DFLTCC compression for levels 1-6 (i.e. to
26+
make it used by default) one could either configure with
27+
--dfltcc-level-mask=0x7e or set the environment variable
28+
DFLTCC_LEVEL_MASK to 0x7e at run time.

0 commit comments

Comments
 (0)