-
Notifications
You must be signed in to change notification settings - Fork 28
Description
I'm trying to make hdf5plugin usable within h5netcdf.
It works already nicely using the advertised approach using either compression=hdf5plugin.LZ4() or via **hdf5plugin.Blosc().
Now, for conciseness, I want to be able to directly use it like this:
compression = 320001 # blosc
compression_opts = (0, 0, 0, 0, 4, 1, 1) # should work, since this is what will be provided by dict-unpackingThis does return without flaws:
import numpy
import h5py
import hdf5plugin
# Compression
with h5py.File('test.h5', 'w') as f:
f.create_dataset(
'data',
data=numpy.arange(100),
chunks=(50,),
compression=32001,
compression_opts=(0,0,0,0,4,1,1))
# Decompression
with h5py.File('test.h5', 'r') as f:
data = f['data']
print(data[()])
print(data._filters)
print(data.id.get_create_plist().get_nfilters())
print(data.id.get_create_plist().get_filter(0))
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
96 97 98 99]
{'32001': (2, 2, 8, 400, 4, 1, 1)}
1
(32001, 1, (2, 2, 8, 400, 4, 1, 1), b'blosc')
h5dump shows filter with compression:
HDF5 "test.h5" {
GROUP "/" {
DATASET "data" {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 100 ) / ( 100 ) }
STORAGE_LAYOUT {
CHUNKED ( 50 )
SIZE 184 (4.348:1 COMPRESSION)
}
FILTERS {
USER_DEFINED_FILTER {
FILTER_ID 32001
COMMENT blosc
PARAMS { 2 2 8 400 4 1 1 }
}
}
But this (silently) does nothing (I think this is normal h5py/hdf5 behaviour if a filter is not applicable for some reason) although the filter is reported everywhere:
# change this to some erroneous value
compression_opts=(0,0,0,0,10,1,1))
Also note the clevel related output:
`clevel` parameter must be between 0 and 9!
`clevel` parameter must be between 0 and 9!
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
96 97 98 99]
{'32001': (2, 2, 8, 400, 10, 1, 1)}
1
(32001, 1, (2, 2, 8, 400, 10, 1, 1), b'blosc')
In the h5dump, we can see that the filter was not applied (no compression), although it is added. Bug or feature?
HDF5 "test.h5" {
GROUP "/" {
DATASET "data" {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 100 ) / ( 100 ) }
STORAGE_LAYOUT {
CHUNKED ( 50 )
SIZE 800 (1.000:1 COMPRESSION)
}
FILTERS {
USER_DEFINED_FILTER {
FILTER_ID 32001
COMMENT blosc
PARAMS { 2 2 8 400 10 1 1 }
}
}
}
The dataset is reported to have Blosc filter applied via h5py and also via netCDF4 (which also reports the wrong clevel of 10).
with nc.Dataset("test.h5") as ds:
print(ds["data"][:])
print(ds["data"].filters())
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
96 97 98 99]
{'zlib': False, 'szip': False, 'zstd': False, 'bzip2': False, 'blosc': {'compressor': 'blosc_lz4', 'shuffle': 1}, 'shuffle': False, 'complevel': 10, 'fletcher32': False}
We get these kind of warnings (see above) for wrong clevel and shuffle, but we do not get this if we have compression out of range. It just silently does not apply the filter.
Any thoughts on that? How can we make sure to not use any problematic settings?
Would it be possible to do something like this:
import hdf5plugin
filter = hdf5plugin.from_id(32001, opts=(0, 0, 0, 0, 4, 1, 1))
f.create_dataset(
'data',
data=numpy.arange(100),
chunks=(50,),
**filter)