Skip to content

Commit 14dea0d

Browse files
authored
Merge branch 'main' into add_max_channel_for_phy
2 parents 379449f + e930ece commit 14dea0d

File tree

10 files changed

+150
-65
lines changed

10 files changed

+150
-65
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
* Add Plexon2 support [PR #918](https://github.com/catalystneuro/neuroconv/pull/918)
1212
* Converter working with multiple VideoInterface instances [PR #914](https://github.com/catalystneuro/neuroconv/pull/914)
1313
* Added helper function `neuroconv.tools.data_transfers.submit_aws_batch_job` for basic automated submission of AWS batch jobs. [PR #384](https://github.com/catalystneuro/neuroconv/pull/384)
14+
* Data interfaces `run_conversion` method now performs metadata validation before running the conversion. [PR #949](https://github.com/catalystneuro/neuroconv/pull/949)
1415
* Introduced `null_values_for_properties` to `add_units_table` to give user control over null values behavior [PR #989](https://github.com/catalystneuro/neuroconv/pull/989)
1516

1617

@@ -26,6 +27,7 @@
2627
* The `DeeplabcutInterface` now skips inferring timestamps from movie when timestamps are specified, running faster. [PR #967](https://github.com/catalystneuro/neuroconv/pull/967)
2728
* Improve metadata writing for SpikeGLX data interface. Added contact ids, shank ids and, remove references to shanks for neuropixels 1.0. Also deprecated the previous neuroconv exclusive property "electrode_shank_number` [PR #986](https://github.com/catalystneuro/neuroconv/pull/986)
2829
* Add tqdm with warning to DeepLabCut interface [PR #1006](https://github.com/catalystneuro/neuroconv/pull/1006)
30+
* `BaseRecordingInterface` now calls default metadata when metadata is not passing mimicking `run_conversion` behavior. [PR #1012](https://github.com/catalystneuro/neuroconv/pull/1012)
2931

3032

3133
## v0.5.0 (July 17, 2024)

docs/user_guide/adding_trials.rst

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,9 @@
33
Adding Trials to NWB Files
44
==========================
55

6-
NWB allows you to store information about time intervals in a structured way. These structure are often used to store
7-
information about trials, epochs, or other time intervals in the data.
8-
You can add time intervals to an NWBFile object before writing it using PyNWB.
9-
Here is an example of how to add trials to an NWBFile object.
10-
Here is how you would add trials to an NWB file:
6+
NWB allows you to store information about timing information in a structured way.
7+
These structures are often used to store information about trials, epochs, or other time intervals in the data.
8+
Here is how to add trials to an NWBFile object:
119

1210
.. code-block:: python
1311
@@ -21,10 +19,15 @@ You can also add epochs or other types of time intervals to an NWB File. See
2119
`PyNWB Annotating Time Intervals <https://pynwb.readthedocs.io/en/stable/tutorials/general/plot_timeintervals.html>`_
2220
for more information.
2321

24-
Once this information is added, you can write the NWB file to disk.
22+
Once this information is added, you can write the NWB file to disk:
2523

2624
.. code-block:: python
2725
2826
from neuroconv.tools.nwb_helpers import configure_and_write_nwbfile
2927
30-
configure_and_write_nwbfile(nwbfile, save_path="path/to/destination.nwb", backend="hdf5")
28+
configure_and_write_nwbfile(
29+
nwbfile, save_path="path/to/destination.nwb", backend="hdf5"
30+
)
31+
32+
This will write the NWB file to disk with the added trials information, and optimize the storage settings of large
33+
datasets for cloud compute.

docs/user_guide/datainterfaces.rst

Lines changed: 35 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -28,33 +28,33 @@ For instance, to install the dependencies for SpikeGLX, run:
2828
2929
2. Construction
3030
~~~~~~~~~~~~~~~
31-
Initialize a class and direct it to the appropriate source data. This will open
32-
the files and read header information, setting up the system for conversion,
33-
but generally will not read the underlying data.
31+
Initialize a class and direct it to the appropriate source data:
3432

3533
.. code-block:: python
3634
3735
from neuroconv.datainterfaces import SpikeGLXRecordingInterface
3836
3937
interface = SpikeGLXRecordingInterface(file_path="path/to/towersTask_g0_t0.imec0.ap.bin")
4038
39+
This will open the files and read header information, setting up the system for conversion,
40+
but generally will not read the underlying data.
41+
4142
.. note::
4243

4344
To get the form of source_data, run :meth:`.BaseDataInterface.get_source_schema`,
4445
which returns the :ref:`source schema <source_schema>` as a JSON-schema-like dictionary informing
4546
the user of the required and optional input arguments to the downstream readers.
4647

47-
4848
3. Get and adjust metadata
4949
~~~~~~~~~~~~~~~~~~~~~~~~~~
5050
Each ``DataInterface`` can extract relevant metadata from the source files and
51-
organize it in a ``metadata`` hierarchical dictionary. This dictionary
52-
can be edited to include data not available in the source files.
51+
organize it in a ``metadata`` hierarchical dictionary:
5352

5453
.. code-block:: python
5554
5655
metadata = interface.get_metadata()
5756
57+
This dictionary can be edited to include data not available in the source files.
5858
The DANDI Archive requires subject ID, sex, age, and species, which are rarely present in the source data. Here is how you would add them.
5959

6060
.. code-block:: python
@@ -75,14 +75,17 @@ The DANDI Archive requires subject ID, sex, age, and species, which are rarely p
7575
- ``U`` for Unknown
7676
- ``O`` for Other
7777

78-
``age`` follows the `ISO 8601 duration format <https://en.wikipedia.org/wiki/ISO_8601#Durations>`_. For example, ``P30D`` is 30 days old, and ``P1Y`` would be 1 year old. To express a range of ages, you can use a slash, for example ``P30D/P35D`` for 30 to 35 days old.
78+
``age`` follows the `ISO 8601 duration format <https://en.wikipedia.org/wiki/ISO_8601#Durations>`_.
79+
For example, ``P30D`` is 30 days old, and ``P1Y`` would be 1 year old.
80+
To express a range of ages, you can use a slash, for example ``P30D/P35D`` for 30 to 35 days old.
7981

8082
``species`` is the scientific Latin binomial name of the species. For example, ``Mus musculus``
8183
for a mouse.
8284

83-
See :ref:`Subject Best Practices <best_practice_subject_exists>` for details
85+
See :ref:`Subject Best Practices <best_practice_subject_exists>` for details.
8486

85-
The ``session_start_time`` is also required. This is sometimes found in the source data. If it is not found, you must add it.
87+
The ``session_start_time`` is also required. This is sometimes found in the source data.
88+
If it is not found, you must add it:
8689

8790
.. code-block:: python
8891
@@ -93,13 +96,15 @@ The ``session_start_time`` is also required. This is sometimes found in the sour
9396
9497
You can use ``tz.tzlocal()`` to get the local timezone.
9598

96-
If the ``session_start_time`` is extracted from the source data, it is often missing a timezone. This is not required but is a recommended best practice. Here is how you would add it.
99+
If the ``session_start_time`` is extracted from the source data, it is often missing a timezone.
100+
This is not required but is a recommended best practice. Here is how you would add it:
97101

98102
.. code-block:: python
99103
100104
metadata["NWBFile"]["session_start_time"] = metadata["NWBFile"]["session_start_time"].replace(tzinfo=ZoneInfo("US/Pacific"))
101105
102-
NWB Best Practices also recommends several other fields that are rarely present in the extracted metadata. The metadata dictionary is the place to add this information.
106+
NWB Best Practices also recommends several other fields that are rarely present in the extracted metadata.
107+
The metadata dictionary is the place to add this information:
103108

104109
.. code-block:: python
105110
@@ -113,7 +118,9 @@ NWB Best Practices also recommends several other fields that are rarely present
113118
keywords=["finches", "evolution", "Galapagos"],
114119
)
115120
116-
The ``metadata`` dictionary also contains metadata that pertain to the specific data being converted. In this example, the ``Ecephys`` key contains metadata that pertains to the electrophysiology data being converted. This metadata can be edited in the same way.
121+
The ``metadata`` dictionary also contains metadata that pertain to the specific data being converted.
122+
In this example, the ``Ecephys`` key contains metadata that pertains to the electrophysiology data being converted.
123+
This metadata can be edited in the same way:
117124

118125
.. code-block:: python
119126
@@ -134,19 +141,22 @@ The ``metadata`` dictionary also contains metadata that pertain to the specific
134141
'description': 'Name of the ElectrodeGroup this electrode is a part of.'},
135142
{'name': 'contact_shapes', 'description': 'The shape of the electrode'}]}
136143
137-
Here we can see that ``metadata["Ecephys"]["ElectrodeGroup"][0]["location"]`` is ``unknown``. We can add this information as follows:
144+
Here we can see that ``metadata["Ecephys"]["ElectrodeGroup"][0]["location"]`` is ``unknown``.
145+
We can add this information as follows:
138146

139147
.. code-block:: python
140148
141149
metadata["Ecephys"]["ElectrodeGroup"]["location"] = "V1"
142150
143151
144-
Use ``.get_metadata_schema()`` to get the schema of the metadata dictionary. This schema is a JSON-schema-like dictionary that specifies required and optional fields in the metadata dictionary. See :ref:`metadata schema <metadata_schema>` for more information.
152+
Use ``.get_metadata_schema()`` to get the schema of the metadata dictionary.
153+
This schema is a JSON-schema-like dictionary that specifies required and optional fields in the metadata dictionary.
154+
See :ref:`metadata schema <metadata_schema>` for more information.
145155

146156
4a. Run conversion
147157
~~~~~~~~~~~~~~~~~~
148158
The ``.run_conversion`` method takes the (edited) metadata dictionary and
149-
the path of an NWB file, and launches the actual data conversion into NWB.
159+
the path of an NWB file, and launches the actual data conversion into NWB:
150160

151161
.. code-block:: python
152162
@@ -162,21 +172,24 @@ the file size of the output NWB file and optimizing the file for cloud compute.
162172

163173
4b. Create an in-memory NWB file
164174
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
165-
If you want to create an in-memory NWB file, you can use the ``.create_nwbfile`` method.
175+
You can also create an in-memory NWB file:
166176

167177
.. code-block:: python
168178
169179
nwbfile = spikeglx_interface.create_nwbfile(metadata=metadata)
170180
171-
This is useful for add data such as trials, epochs, or other time intervals to the NWB file. See
172-
:ref:`Adding Time Intervals to NWB Files <adding_trials>` for more information.
181+
This is useful for adding extra data such as trials, epochs, or other time intervals to the NWB file.
182+
See :ref:`Adding Time Intervals to NWB Files <adding_trials>` for more information.
173183

174-
This does not load large datasets into memory. Those remain in the source files and are read piece-by-piece during the
175-
write process. Once you make all the modifications you want to the NWBfile, you can save it to disk. The following code
176-
automatically optimizes datasets for cloud compute and writes the file to disk.
184+
This does not load large datasets into memory.
185+
Those remain in the source files and are read piece-by-piece during the write process.
186+
Once you make all the modifications you want to the NWBfile, you can save it to disk.
187+
The following code automatically optimizes datasets for cloud compute and writes the file to disk:
177188

178189
.. code-block:: python
179190
180191
from neuroconv.tools.nwb_helpers import configure_and_write_nwbfile
181192
182-
configure_and_write_nwbfile(nwbfile, save_path="path/to/destination.nwb", backend="hdf5")
193+
configure_and_write_nwbfile(
194+
nwbfile, save_path="path/to/destination.nwb", backend="hdf5"
195+
)

docs/user_guide/nwbconverter.rst

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ preprocessing systems with different proprietary formats in the same session.
66
For instance, in a given extracellular electrophysiology experiment, you might
77
have raw and processed data. The :py:class:`.NWBConverter` class streamlines this
88
conversion process. This single :py:class:`.NWBConverter` object is responsible for
9-
combining those multiple read/write operations. An example of how to define
10-
a :py:class:`.NWBConverter` would be
9+
combining those multiple read/write operations. Here is an example definition of a
10+
:py:class:`.NWBConverter`:
1111

1212
.. code-block:: python
1313
@@ -42,15 +42,14 @@ keys of``data_interface_classes``.
4242
4343
example_nwb_converter = ExampleNWBConverter(source_data)
4444
45-
This creates an :py:class:`.NWBConverter` object that can aggregate and distribute across
46-
the data interfaces. To fetch metadata across all of the interfaces and merge
47-
them together, call.
45+
This creates an :py:class:`.NWBConverter`. To fetch metadata across all of the interfaces and merge
46+
them together, call:
4847

4948
.. code-block:: python
5049
5150
metadata = converter.get_metadata()
5251
53-
The metadata can then be manually modified with any additional user-input, just like ``DataInterface`` objects.
52+
The metadata can then be manually modified with any additional user-input, just like ``DataInterface`` objects:
5453

5554
.. code-block:: python
5655
@@ -59,14 +58,16 @@ The metadata can then be manually modified with any additional user-input, just
5958
metadata["Subject"]["subject_id"] = "ID of experimental subject"
6059
6160
The final metadata dictionary should follow the form defined by :meth:`.NWBConverter.get_metadata_schema`.
62-
Now run the entire conversion with.
61+
62+
Now run the entire conversion with:
6363

6464
.. code-block:: python
6565
6666
converter.run_conversion(metadata=metadata, nwbfile_path="my_nwbfile.nwb")
6767
68-
Like ``DataInterface`` objects, :py:class:`.NWBConverter` objects can output an in-memory NWBFile object by
69-
calling :meth:`.NWBConverter.create_nwbfile`. This can be useful for debugging or for further processing.
68+
Like ``DataInterface`` objects, :py:class:`.NWBConverter` objects can output an in-memory :py:class:`.NWBFile` object by
69+
calling :meth:`.NWBConverter.create_nwbfile`. This can be useful for debugging, for adding metadata to the file, or for
70+
further processing.
7071

7172
Though this example was only for two data streams (recording and spike-sorted
7273
data), it can easily extend to any number of sources, including video of a

src/neuroconv/basedatainterface.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,14 +64,20 @@ def get_metadata(self) -> DeepDict:
6464

6565
return metadata
6666

67-
def validate_metadata(self, metadata: dict) -> None:
67+
def validate_metadata(self, metadata: dict, append_mode: bool = False) -> None:
6868
"""Validate the metadata against the schema."""
6969
encoder = NWBMetaDataEncoder()
7070
# The encoder produces a serialized object, so we deserialized it for comparison
7171

7272
serialized_metadata = encoder.encode(metadata)
7373
decoded_metadata = json.loads(serialized_metadata)
74-
validate(instance=decoded_metadata, schema=self.get_metadata_schema())
74+
metdata_schema = self.get_metadata_schema()
75+
if append_mode:
76+
# Eliminate required from NWBFile
77+
nwbfile_schema = metdata_schema["properties"]["NWBFile"]
78+
nwbfile_schema.pop("required", None)
79+
80+
validate(instance=decoded_metadata, schema=metdata_schema)
7581

7682
def create_nwbfile(self, metadata: Optional[dict] = None, **conversion_options) -> NWBFile:
7783
"""
@@ -157,6 +163,11 @@ def run_conversion(
157163
if metadata is None:
158164
metadata = self.get_metadata()
159165

166+
file_initially_exists = Path(nwbfile_path).exists() if nwbfile_path is not None else False
167+
append_mode = file_initially_exists and not overwrite
168+
169+
self.validate_metadata(metadata=metadata, append_mode=append_mode)
170+
160171
with make_or_load_nwbfile(
161172
nwbfile_path=nwbfile_path,
162173
nwbfile=nwbfile,

src/neuroconv/datainterfaces/ecephys/baserecordingextractorinterface.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -299,7 +299,7 @@ def add_to_nwbfile(
299299
write_electrical_series: bool = True,
300300
compression: Optional[str] = None, # TODO: remove completely after 10/1/2024
301301
compression_opts: Optional[int] = None,
302-
iterator_type: str = "v2",
302+
iterator_type: Optional[str] = "v2",
303303
iterator_opts: Optional[dict] = None,
304304
):
305305
"""
@@ -324,10 +324,9 @@ def add_to_nwbfile(
324324
write_electrical_series : bool, default: True
325325
Electrical series are written in acquisition. If False, only device, electrode_groups,
326326
and electrodes are written to NWB.
327-
iterator_type : {'v2', 'v1'}
327+
iterator_type : {'v2'}
328328
The type of DataChunkIterator to use.
329-
'v1' is the original DataChunkIterator of the hdmf data_utils.
330-
'v2' is the locally developed RecordingExtractorDataChunkIterator, which offers full control over chunking.
329+
'v2' is the locally developed RecordingExtractorDataChunkIterator, which offers full control over chunking
331330
iterator_opts : dict, optional
332331
Dictionary of options for the RecordingExtractorDataChunkIterator (iterator_type='v2').
333332
Valid options are:
@@ -357,6 +356,9 @@ def add_to_nwbfile(
357356
else:
358357
recording = self.recording_extractor
359358

359+
if metadata is None:
360+
metadata = self.get_metadata()
361+
360362
add_recording(
361363
recording=recording,
362364
nwbfile=nwbfile,

src/neuroconv/nwbconverter.py

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -101,13 +101,20 @@ def get_metadata(self) -> DeepDict:
101101
metadata = dict_deep_update(metadata, interface_metadata)
102102
return metadata
103103

104-
def validate_metadata(self, metadata: Dict[str, dict]):
104+
def validate_metadata(self, metadata: Dict[str, dict], append_mode: bool = False):
105105
"""Validate metadata against Converter metadata_schema."""
106106
encoder = NWBMetaDataEncoder()
107107
# The encoder produces a serialized object, so we deserialized it for comparison
108108
serialized_metadata = encoder.encode(metadata)
109109
decoded_metadata = json.loads(serialized_metadata)
110-
validate(instance=decoded_metadata, schema=self.get_metadata_schema())
110+
111+
metadata_schema = self.get_metadata_schema()
112+
if append_mode:
113+
# Eliminate required from NWBFile
114+
nwbfile_schema = metadata_schema["properties"]["NWBFile"]
115+
nwbfile_schema.pop("required", None)
116+
117+
validate(instance=decoded_metadata, schema=metadata_schema)
111118
if self.verbose:
112119
print("Metadata is valid!")
113120

@@ -206,7 +213,7 @@ def run_conversion(
206213
"""
207214

208215
if nwbfile_path is None:
209-
warnings.warn( # TODO: remove on or after 12/26/2024
216+
warnings.warn( # TODO: remove on or after 2024/12/26
210217
"Using Converter.run_conversion without specifying nwbfile_path is deprecated. To create an "
211218
"NWBFile object in memory, use Converter.create_nwbfile. To append to an existing NWBFile object,"
212219
" use Converter.add_to_nwbfile."
@@ -215,10 +222,13 @@ def run_conversion(
215222
backend = _resolve_backend(backend, backend_configuration)
216223
no_nwbfile_provided = nwbfile is None # Otherwise, variable reference may mutate later on inside the context
217224

225+
file_initially_exists = Path(nwbfile_path).exists() if nwbfile_path is not None else False
226+
append_mode = file_initially_exists and not overwrite
227+
218228
if metadata is None:
219229
metadata = self.get_metadata()
220230

221-
self.validate_metadata(metadata=metadata)
231+
self.validate_metadata(metadata=metadata, append_mode=append_mode)
222232
self.validate_conversion_options(conversion_options=conversion_options)
223233

224234
self.temporally_align_data_interfaces()

0 commit comments

Comments
 (0)