Feature Request: Parametrization with compression_opts

I'm trying to make hdf5plugin usable within h5netcdf.

It works already nicely using the advertised approach using either `compression=hdf5plugin.LZ4()` or via `**hdf5plugin.Blosc()`. 

Now, for conciseness, I want to be able to directly use it like this:

```python
compression = 320001  # blosc
compression_opts = (0, 0, 0, 0, 4, 1, 1)  # should work, since this is what will be provided by dict-unpacking
```

This does return without flaws:

```
import numpy
import h5py
import hdf5plugin
# Compression
with h5py.File('test.h5', 'w') as f:
    f.create_dataset(
        'data', 
        data=numpy.arange(100), 
        chunks=(50,), 
        compression=32001, 
        compression_opts=(0,0,0,0,4,1,1))
# Decompression
with h5py.File('test.h5', 'r') as f:
    data = f['data']
    print(data[()])
    print(data._filters)
    print(data.id.get_create_plist().get_nfilters())
    print(data.id.get_create_plist().get_filter(0))
```
```
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 96 97 98 99]
{'32001': (2, 2, 8, 400, 4, 1, 1)}
1
(32001, 1, (2, 2, 8, 400, 4, 1, 1), b'blosc')
```
h5dump shows filter with compression:

```
HDF5 "test.h5" {
GROUP "/" {
   DATASET "data" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 100 ) / ( 100 ) }
      STORAGE_LAYOUT {
         CHUNKED ( 50 )
         SIZE 184 (4.348:1 COMPRESSION)
      }
      FILTERS {
         USER_DEFINED_FILTER {
            FILTER_ID 32001
            COMMENT blosc
            PARAMS { 2 2 8 400 4 1 1 }
         }
      }
```

But this (silently) does nothing (I think this is normal h5py/hdf5 behaviour if a filter is not applicable for some reason) although the filter is reported everywhere:

```
# change this to some erroneous value
compression_opts=(0,0,0,0,10,1,1))
```

Also note the `clevel` related output:

```
`clevel` parameter must be between 0 and 9!
`clevel` parameter must be between 0 and 9!
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 96 97 98 99]
{'32001': (2, 2, 8, 400, 10, 1, 1)}
1
(32001, 1, (2, 2, 8, 400, 10, 1, 1), b'blosc')
```

In the h5dump, we can see that the filter was not applied (no compression), although it is added. Bug or feature?

```
HDF5 "test.h5" {
GROUP "/" {
   DATASET "data" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 100 ) / ( 100 ) }
      STORAGE_LAYOUT {
         CHUNKED ( 50 )
         SIZE 800 (1.000:1 COMPRESSION)
      }
      FILTERS {
         USER_DEFINED_FILTER {
            FILTER_ID 32001
            COMMENT blosc
            PARAMS { 2 2 8 400 10 1 1 }
         }
      }
}
```

The dataset is reported to have `Blosc` filter applied via `h5py` and also via `netCDF4` (which also reports the wrong clevel of 10).

```
with nc.Dataset("test.h5") as ds:
    print(ds["data"][:])
    print(ds["data"].filters())
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 96 97 98 99]
{'zlib': False, 'szip': False, 'zstd': False, 'bzip2': False, 'blosc': {'compressor': 'blosc_lz4', 'shuffle': 1}, 'shuffle': False, 'complevel': 10, 'fletcher32': False}
```

We get these kind of warnings (see above) for wrong clevel and shuffle, but we do not get this if we have compression out of range. It just silently does not apply the filter. 

Any thoughts on that? How can we make sure to not use any problematic settings?

Would it be possible to do something like this:

```python
import hdf5plugin

filter = hdf5plugin.from_id(32001, opts=(0, 0, 0, 0, 4, 1, 1))
f.create_dataset(
        'data', 
        data=numpy.arange(100), 
        chunks=(50,), 
        **filter)
```











Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Parametrization with compression_opts #365

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Parametrization with compression_opts #365

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions