While testing #281 I found that test_subset_write.py is much faster when working on remote files accessed via the hdfstream web service. We seem to incur a lot of overhead in h5py's dataset.read_direct() method. Using local files on Cosma the test on EagleDistributed.hdf5 takes 70 seconds and spends 90% of that time in Dataset.read_direct(). Using remote files it takes 25 seconds, of which 70% is in SSLSocket.read().
Here's a profile of the case where we read local HDF5 files:

read_direct() is called about 5000 times. I was able to reduce the runtime from 70 seconds to 20 seconds by using the h5py low level API to select all of the slices then read them with a single H5Dread(). But I'm not sure how this will affect more realistic cases and it results in a different ordering of the result if the slices are not sorted, so one of the SOAP tests fails.
The tests in test_mask.py show similar behaviour: running on remote files takes less than half as long. Although the individual tests are only ~5 seconds or less in that case.
While testing #281 I found that test_subset_write.py is much faster when working on remote files accessed via the hdfstream web service. We seem to incur a lot of overhead in h5py's dataset.read_direct() method. Using local files on Cosma the test on EagleDistributed.hdf5 takes 70 seconds and spends 90% of that time in Dataset.read_direct(). Using remote files it takes 25 seconds, of which 70% is in SSLSocket.read().
Here's a profile of the case where we read local HDF5 files:
read_direct() is called about 5000 times. I was able to reduce the runtime from 70 seconds to 20 seconds by using the h5py low level API to select all of the slices then read them with a single H5Dread(). But I'm not sure how this will affect more realistic cases and it results in a different ordering of the result if the slices are not sorted, so one of the SOAP tests fails.
The tests in test_mask.py show similar behaviour: running on remote files takes less than half as long. Although the individual tests are only ~5 seconds or less in that case.