Skip to content

Add LZ4HC and fast-encoder acceleration parameter to LZ4 plugin 32004 #244

@ajelenak

Description

@ajelenak

This is a design brief for adding the LZ4 high-compression encoder
(LZ4_compress_HC) and the fast-encoder acceleration parameter
(LZ4_compress_fast) to the HDF5 LZ4 filter plugin ID 32004, with full backward
compatibility for existing archives and existing reader deployments.

Background

The LZ4 plugin ID 32004 currently exposes only the default fast LZ4 encoder via
LZ4_compress_default. It does not provide access to the LZ4
high-compression encoder (LZ4_compress_HC) or to the acceleration
parameter of LZ4_compress_fast.

LZ4HC offers desirable new capabilities:

  • It produces the same LZ4 Block format on the wire as fast LZ4, so
    existing decoders and existing files are unaffected.
  • It typically gains 15–30% on compression ratio over fast LZ4.
  • Decompression speed is identical to fast LZ4 (same bitstream, same
    decoder).

The extension proposed here also exposes the acceleration parameter of
LZ4_compress_fast, so users who care about write throughput can
trade some ratio for additional encode speed. Both directions of the
speed/ratio frontier — HC for ratio, accelerated fast for speed — are
made available through the same cd_values[] mechanism.

The proposed additions hereare purely additive: no breaking changes to on-disk
format, no change to the chunk wrapper, and no new filter ID.

New cd_values[] layout

cd_values[]:

Index Parameter Status Description
cd_values[0] blockSize Unchanged 0 plugin default
cd_values[1] encoderMode New 0 fast LZ4 (default),
1 LZ4HC,
2+ reserved
cd_values[2] encoderParam New Meaning depends on encoderMode — see table below

encoderParam reference by encoderMode:

encoderMode encoderParam Value Behavior
0 — fast LZ4 0 LZ4_compress_default behavior
0 — fast LZ4 >=1 Acceleration factor for LZ4_compress_fast
1 — LZ4HC 0 Default level (LZ4HC_CLEVEL_DEFAULT == 9)
1 — LZ4HC 1..12 Explicit level (clamped to LZ4HC_CLEVEL_MIN/MAX)

Fast-mode acceleration factor value range [1, LZ4_ACCELERATION_MAX].

Backward compatibility rules

The plugin interprets cd_values[] based on cd_nelmts:

  • cd_nelmts == 0 — blockSize default, fast LZ4, default acceleration.
  • cd_nelmts == 1 — blockSize from cd_values[0], otherwise as above.
  • cd_nelmts == 2 — also honor encoderMode; encoderParam defaults to 0.
  • cd_nelmts >= 3 — all three slots honored.
  • cd_nelmts > 3 — trailing slots reserved, ignored.

Compatibility properties

  • Files written by the old plugin (typically cd_nelmts <= 1) are read
    by the new plugin identically: slots 1..2 are absent, defaults apply.
  • Files written by the new plugin in fast mode (encoderMode == 0) are
    byte-identical to those written by the old plugin at the same
    blockSize.
  • Files written by the new plugin in HC mode (encoderMode == 1)
    produce standard LZ4 Block payloads that the old plugin's decoder
    (LZ4_decompress_safe) reads correctly without modification.

cc: @michaelrissi @captainkirk99 @nhz2

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions