|
92 | 92 | # |
93 | 93 | # Streaming Method 2: ROS3 |
94 | 94 | # ------------------------ |
95 | | -# ROS3 is one of the supported methods for reading data from a remote store. ROS3 stands for "read only S3" and is a |
96 | | -# driver created by the HDF5 Group that allows HDF5 to read HDF5 files stored remotely in s3 buckets. Using this method |
97 | | -# requires that your HDF5 library is installed with the ROS3 driver enabled. This is not the default configuration, |
98 | | -# so you will need to make sure you install the right version of ``h5py`` that has this advanced configuration enabled. |
99 | | -# You can install HDF5 with the ROS3 driver from `conda-forge <https://conda-forge.org/>`_ using ``conda``. You may |
100 | | -# first need to uninstall a currently installed version of ``h5py``. |
101 | | -# |
102 | | -# .. code-block:: bash |
103 | | -# |
104 | | -# pip uninstall h5py |
105 | | -# conda install -c conda-forge "h5py>=3.2" |
106 | | -# |
107 | | -# Now instantiate a :py:class:`~pynwb.NWBHDF5IO` object with the S3 URL and specify the driver as "ros3". This |
108 | | -# will download metadata about the file from the S3 bucket to memory. The values of datasets are accessed lazily, |
109 | | -# just like when reading an NWB file stored locally. So, slicing into a dataset will require additional time to |
110 | | -# download the sliced data (and only the sliced data) to memory. |
| 95 | +# ROS3 stands for "read only S3" and is a driver created by the HDF5 Group that allows HDF5 to read HDF5 files stored |
| 96 | +# remotely in s3 buckets. Using this method requires that your HDF5 library is installed with the ROS3 driver enabled. |
| 97 | +# With ROS3 support enabled in h5py, we can instantiate a :py:class:`~pynwb.NWBHDF5IO` object with the S3 URL and |
| 98 | +# specify the driver as "ros3". |
111 | 99 |
|
112 | 100 | from pynwb import NWBHDF5IO |
113 | 101 |
|
|
116 | 104 | print(nwbfile) |
117 | 105 | print(nwbfile.acquisition['lick_times'].time_series['lick_left_times'].data[:]) |
118 | 106 |
|
| 107 | +################################## |
| 108 | +# This will download metadata about the file from the S3 bucket to memory. The values of datasets are accessed lazily, |
| 109 | +# just like when reading an NWB file stored locally. So, slicing into a dataset will require additional time to |
| 110 | +# download the sliced data (and only the sliced data) to memory. |
| 111 | +# |
| 112 | +# .. note:: |
| 113 | +# |
| 114 | +# Pre-built h5py packages on PyPI do not include this S3 support. If you want this feature, you could use packages |
| 115 | +# from conda-forge, or build h5py from source against an HDF5 build with S3 support. You can install HDF5 with |
| 116 | +# the ROS3 driver from `conda-forge <https://conda-forge.org/>`_ using ``conda``. You may |
| 117 | +# first need to uninstall a currently installed version of ``h5py``. |
| 118 | +# |
| 119 | +# .. code-block:: bash |
| 120 | +# |
| 121 | +# pip uninstall h5py |
| 122 | +# conda install -c conda-forge "h5py>=3.2" |
| 123 | + |
| 124 | + |
119 | 125 | ################################################## |
120 | 126 | # Which streaming method to choose? |
121 | 127 | # --------------------------------- |
122 | 128 | # |
123 | | -# fsspec has many advantages over ros3: |
124 | | -# |
125 | | -# 1. fsspec is easier to install |
126 | | -# 2. fsspec supports caching, which will dramatically speed up repeated requests for the |
127 | | -# same region of data |
128 | | -# 3. fsspec automatically retries when s3 fails to return. |
129 | | -# 4. fsspec works with other storage backends and |
130 | | -# 5. fsspec works with other types of files. |
131 | | -# 6. In our hands, fsspec is faster out-of-the-box. |
| 129 | +# From a user perspective, once opened, the :py:class:`~pynwb.file.NWBFile` works the same with |
| 130 | +# both fsspec and ros3. However, in general, we currently recommend using fsspec for streaming |
| 131 | +# NWB files because it is more performant and reliable than ros3. In particular fsspec: |
132 | 132 | # |
133 | | -# For these reasons, we would recommend use fsspec for most Python users. |
| 133 | +# 1. supports caching, which will dramatically speed up repeated requests for the |
| 134 | +# same region of data, |
| 135 | +# 2. automatically retries when s3 fails to return, which helps avoid errors when accessing data due to |
| 136 | +# intermittent errors in connections with S3, |
| 137 | +# 3. works also with other storage backends (e.g., GoogleDrive or Dropbox, not just S3) and file formats, and |
| 138 | +# 4. in our experience appears to provide faster out-of-the-box performance than the ros3 driver. |
0 commit comments