Blosc/Blosc2 segfault with variable-width strings

Blosc and Blosc2 crash when faced with variable-width strings, both the legacy object strings or the new NpyStrings a.k.a. StringDType.

This is caused by an upstream bug. Pytables is also affected.
#363 introduces unit tests for string dtypes, which have been temporarily skipped for blosc and blosc2.

# Reproducer

| compression | i8 | S3 | object | T |
| --- | --- | --- | --- | --- |
| "gzip" | ✔️ |✔️ |✔️ |✔️ |
| "lzf" | ✔️ |✔️ |✔️ |✔️ |
| hdf5plugin.BZip2() | ✔️ |✔️ |✔️ |✔️ |
| hdf5plugin.LZ4() | ✔️ |✔️ |✔️ |✔️ |
| hdf5plugin.Blosc() | ✔️ |✔️ |segfault|segfault|
| hdf5plugin.Blosc2() | ✔️ |✔️ |segfault|segfault|

Full reproducer:
```python
import os

import h5py
import hdf5plugin
import numpy as np

fname = "/tmp/ds.h5"

for compression in (
    None,
    "gzip",
    "lzf",
    hdf5plugin.BZip2(),
    hdf5plugin.LZ4(),
    hdf5plugin.Blosc(),
    hdf5plugin.Blosc2(),
):
    for data in (
        np.asarray([1]),
        np.asarray(["foo"], dtype="S"),
        np.asarray([b"foo"], dtype="O"),
        np.asarray(["foo"], dtype="T"),
    ):
        print("desired compression =", compression)
        print("dtype =", data.dtype)

        # Optional: produce meaningful differences in file size
        data = np.tile(data, 1_000_000)

        with h5py.File(fname, "w") as f:
            f.create_dataset("mydataset", data=data, compression=compression)

        print("file size =", os.path.getsize(fname))
        with h5py.File(fname, "r+") as f:
            ds = f["mydataset"]
            print("actual compression =", ds.compression)
            print("compression_opts =", ds.compression_opts)

            actual = (ds.astype("T") if data.dtype.kind == "T" else ds)[:]
        np.testing.assert_array_equal(actual, data)

        print("=" * 80, flush=True)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Blosc/Blosc2 segfault with variable-width strings #364

Reproducer

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

compression	i8	S3	object	T
"gzip"	✔️	✔️	✔️	✔️
"lzf"	✔️	✔️	✔️	✔️
hdf5plugin.BZip2()	✔️	✔️	✔️	✔️
hdf5plugin.LZ4()	✔️	✔️	✔️	✔️
hdf5plugin.Blosc()	✔️	✔️	segfault	segfault
hdf5plugin.Blosc2()	✔️	✔️	segfault	segfault

Blosc/Blosc2 segfault with variable-width strings #364

Description

Reproducer

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions