Skip to content

Not closing external HDF5 client on LindiH5pyDataset causes segmentation fault in unusual typing scenario #113

@rly

Description

@rly

LindiH5pyDataset maintains a dictionary that maps URLs to LindiRemfile objects that were used to open an external array dataset. These open files are not closed. For some reason, when a simple script has a function that has a return type that is a Tuple of lindi.LindiH5pyFile or lindi.LindiH5pyGroup or lindi.LindiH5pyDataset, when cleaning up Python execution of that function, we get a segmentation fault. This does not happen if the return type is simply lindi.LindiH5pyFile. This also does not happen if no data is sliced from the lindi.LindiH5pyDataset so no LindiRemfile object is opened.

I guess:

  • Wrapping lindi.LindiH5pyFile in Tuple defers resolution of lindi.LindiH5pyFile to a later time
  • lindi.LindiH5pyFile imports LindiH5pyDataset
  • Importing LindiH5pyDataset initializes a module-level variable _external_hdf5_clients
  • When cleaning up Python execution, the LindiRemfile that is stored in LindiH5pyDataset gets closed and deleted from one imported LindiH5pyDataset but not the other? I'm not sure...

MWE:

from typing import Tuple
import lindi

def do_nothing() -> Tuple[lindi.LindiH5pyFile]:
    pass

rfs = "https://dandi-api-staging-dandisets.s3.amazonaws.com/blobs/7f0/aa4/7f0aa474-4169-42f8-a895-ada0af4072c7"
client = lindi.LindiH5pyFile.from_lindi_file(url_or_path=rfs)
print(client["acquisition"]["ElectricalSeries"]["data"][0,0])

# the following code prevents the segmentation fault during Python execution clean up
# the external array link is where the external array for client["acquisition"]["ElectricalSeries"]["data"] is located
ext_array_link = "https://api.dandiarchive.org/api/assets/df0e074e-3509-4b03-908e-2a1303072707/download/"
client["acquisition"]["ElectricalSeries"]["data"]._get_external_hdf5_client(ext_array_link).close()

Related to NeurodataWithoutBorders/nwb_benchmarks#136

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions