This is a design brief for adding the LZ4 high-compression encoder
(LZ4_compress_HC) and the fast-encoder acceleration parameter
(LZ4_compress_fast) to the HDF5 LZ4 filter plugin ID 32004, with full backward
compatibility for existing archives and existing reader deployments.
Background
The LZ4 plugin ID 32004 currently exposes only the default fast LZ4 encoder via
LZ4_compress_default. It does not provide access to the LZ4
high-compression encoder (LZ4_compress_HC) or to the acceleration
parameter of LZ4_compress_fast.
LZ4HC offers desirable new capabilities:
- It produces the same LZ4 Block format on the wire as fast LZ4, so
existing decoders and existing files are unaffected.
- It typically gains 15–30% on compression ratio over fast LZ4.
- Decompression speed is identical to fast LZ4 (same bitstream, same
decoder).
The extension proposed here also exposes the acceleration parameter of
LZ4_compress_fast, so users who care about write throughput can
trade some ratio for additional encode speed. Both directions of the
speed/ratio frontier — HC for ratio, accelerated fast for speed — are
made available through the same cd_values[] mechanism.
The proposed additions hereare purely additive: no breaking changes to on-disk
format, no change to the chunk wrapper, and no new filter ID.
New cd_values[] layout
cd_values[]:
| Index |
Parameter |
Status |
Description |
cd_values[0] |
blockSize |
Unchanged |
0 plugin default |
cd_values[1] |
encoderMode |
New |
0 fast LZ4 (default),
1 LZ4HC,
2+ reserved |
cd_values[2] |
encoderParam |
New |
Meaning depends on encoderMode — see table below |
encoderParam reference by encoderMode:
encoderMode |
encoderParam Value |
Behavior |
0 — fast LZ4 |
0 |
LZ4_compress_default behavior |
0 — fast LZ4 |
>=1 |
Acceleration factor for LZ4_compress_fast |
1 — LZ4HC |
0 |
Default level (LZ4HC_CLEVEL_DEFAULT == 9) |
1 — LZ4HC |
1..12 |
Explicit level (clamped to LZ4HC_CLEVEL_MIN/MAX) |
Fast-mode acceleration factor value range [1, LZ4_ACCELERATION_MAX].
Backward compatibility rules
The plugin interprets cd_values[] based on cd_nelmts:
cd_nelmts == 0 — blockSize default, fast LZ4, default acceleration.
cd_nelmts == 1 — blockSize from cd_values[0], otherwise as above.
cd_nelmts == 2 — also honor encoderMode; encoderParam defaults to 0.
cd_nelmts >= 3 — all three slots honored.
cd_nelmts > 3 — trailing slots reserved, ignored.
Compatibility properties
- Files written by the old plugin (typically
cd_nelmts <= 1) are read
by the new plugin identically: slots 1..2 are absent, defaults apply.
- Files written by the new plugin in fast mode (
encoderMode == 0) are
byte-identical to those written by the old plugin at the same
blockSize.
- Files written by the new plugin in HC mode (
encoderMode == 1)
produce standard LZ4 Block payloads that the old plugin's decoder
(LZ4_decompress_safe) reads correctly without modification.
cc: @michaelrissi @captainkirk99 @nhz2
This is a design brief for adding the LZ4 high-compression encoder
(
LZ4_compress_HC) and the fast-encoder acceleration parameter(
LZ4_compress_fast) to the HDF5 LZ4 filter plugin ID 32004, with full backwardcompatibility for existing archives and existing reader deployments.
Background
The LZ4 plugin ID 32004 currently exposes only the default fast LZ4 encoder via
LZ4_compress_default. It does not provide access to the LZ4high-compression encoder (
LZ4_compress_HC) or to the accelerationparameter of
LZ4_compress_fast.LZ4HC offers desirable new capabilities:
existing decoders and existing files are unaffected.
decoder).
The extension proposed here also exposes the acceleration parameter of
LZ4_compress_fast, so users who care about write throughput cantrade some ratio for additional encode speed. Both directions of the
speed/ratio frontier — HC for ratio, accelerated fast for speed — are
made available through the same
cd_values[]mechanism.The proposed additions hereare purely additive: no breaking changes to on-disk
format, no change to the chunk wrapper, and no new filter ID.
New cd_values[] layout
cd_values[]:cd_values[0]blockSize0plugin defaultcd_values[1]encoderMode0fast LZ4 (default),1LZ4HC,2+reservedcd_values[2]encoderParamencoderMode— see table belowencoderParamreference byencoderMode:encoderModeencoderParamValue0— fast LZ40LZ4_compress_defaultbehavior0— fast LZ4>=1LZ4_compress_fast1— LZ4HC0LZ4HC_CLEVEL_DEFAULT == 9)1— LZ4HC1..12LZ4HC_CLEVEL_MIN/MAX)Fast-mode acceleration factor value range
[1, LZ4_ACCELERATION_MAX].Backward compatibility rules
The plugin interprets
cd_values[]based oncd_nelmts:cd_nelmts == 0— blockSize default, fast LZ4, default acceleration.cd_nelmts == 1— blockSize fromcd_values[0], otherwise as above.cd_nelmts == 2— also honorencoderMode;encoderParamdefaults to 0.cd_nelmts >= 3— all three slots honored.cd_nelmts > 3— trailing slots reserved, ignored.Compatibility properties
cd_nelmts <= 1) are readby the new plugin identically: slots 1..2 are absent, defaults apply.
encoderMode == 0) arebyte-identical to those written by the old plugin at the same
blockSize.encoderMode == 1)produce standard LZ4 Block payloads that the old plugin's decoder
(
LZ4_decompress_safe) reads correctly without modification.cc: @michaelrissi @captainkirk99 @nhz2