Skip to content

Commit 25ca46b

Browse files
oruebelstephprince
andauthored
Refactor how BaseRecordingData objects are managed for aquisition (#190)
* Refactor generate_spec_file script to make it easier to use for others as a command-line tool and for extensions * Add namespace_name to the generated output file * Make the NWBFile::cacheSpecifications function public so that it can be called for extensions * Created NamespaceRegistry for managing namespaces to ease integration of extensions * Split NamespaceRegistry into hpp/cpp files, move NamespaceInfo to Types.hpp, and use NamespaceInfo in NWBFile directly * Added developer docs on how to create new namespaces * Minor cleanup of function order for consistency * Minor update to ensure use of NWBFile::m_specificationsPath * Check for JSON and YAML extension in case schema sources were converted for some reason * Ensure variable names replace - with _ symbol * Update registered type docs and fix bug in generate_spec_files * Updated the REGISTER_TYPE macro to work with variables for the namespace * Fix linting * Initial refactor to make BaseRecordingData for datasets accesible via autogenerated record methods * Added unit tests for the new record methods * Fix bug in HDF5IO::getAttribute when retrieving attributes for the root * Fix HDF5RecordingData constructure when used with unchunked data * Fix data type for DEFINE_DATASET_FIELD in NWBFile * Add read unit tests for NWBFile DEFINE_ATTRIBUTE_FIELDS and DEFINE_DATASET_FIELDS * Fix linter errors * Add cache for BaseRecordingData objects on RegisteredType * Added unit test for the cache for BaseRecordingData objects * Fix linting and docstring * Replace custom BaseRecordingData variables on TimeSeries with the use of RecordinContainer.m_datasetCache * Removed unused method NWBFile.createRecordingData * Replace use of custom BaseRecordingData variables on ElectricalSeries * Fix linter * Add option to reset and clear the cache of recording data objects * Add unit tests for caching BaseRecordingData * Update docs of the recording workflow * Added new docs page for describing the design of data recording * Updated Data and VectorData to initalize their own dataset * Fix bug in VectorData::createReferenceVectorData were common attributes were not being created * Remove support for JSON schema from generate_spec_files.py Apply suggestions from code review Co-authored-by: Steph Prince <[email protected]> * Fix namespace formatting in resources/generate_spec_files.py * Fix logic in generate_header_file for splitting large json strings * Avoid repeat definition of YAML file extensions in the generate_spec_files.py script * fix formatting * Fix documentation errors due to merge * Fix build for extension demo * Update workflow to clone the full schema repo and not use GitHub API * Update docs/pages/devdocs/registered_types.dox * Update src/nwb/file/ElectrodeGroup.hpp * Update docs/pages/userdocs/workflow.dox --------- Co-authored-by: Steph Prince <[email protected]>
1 parent 5387f18 commit 25ca46b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+1229
-510
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,9 @@ libs/
2424

2525
# demo build
2626
/build/
27+
demo/*/build
2728
demo/*/CMakeFiles/
2829
demo/cmake-build-*/
2930
demo/inspect_electrical_series/*.nwb
3031
demo/*/Makefile
3132
demo/*/cmake_install.cmake
32-
demo/*/build

demo/labmetadata_extension_demo/src/LabMetaDataExtensionExample.hpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@ class LabMetaDataExtensionExample : public AQNWB::NWB::Container
1515
Status initialize(const std::string& tissuePreparation);
1616

1717
// Define methods for reading custom extension fields
18-
DEFINE_FIELD(
18+
DEFINE_DATASET_FIELD(
1919
readTissuePreparation,
20-
AQNWB::NWB::DatasetField,
20+
recordTissuePreparation,
2121
std::string,
2222
"tissue_preparation",
2323
Lab-specific description of the preparation of the tissue)

docs/Doxyfile.in

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@ PROJECT_NUMBER = "@PROJECT_VERSION@"
1515
# a simplified version of the macro as part of the PREDEFINED key to create a
1616
# simplified expansion of the macro for documentation purposes
1717
MACRO_EXPANSION = YES
18-
PREDEFINED += "DEFINE_FIELD(name, storageObjectType, default_type, fieldPath, description)=/** description */ template<typename VTYPE = default_type> inline std::unique_ptr<IO::ReadDataWrapper<storageObjectType, VTYPE>> name() const;"
18+
PREDEFINED += "DEFINE_ATTRIBUTE_FIELD(name, default_type, fieldPath, description)=/** description */ template<typename VTYPE = default_type> inline std::unique_ptr<IO::ReadDataWrapper<AttributeField, VTYPE>> name() const;"
19+
PREDEFINED += "DEFINE_DATASET_FIELD(readName, writeName, default_type, fieldPath, description)=/** description */ template<typename VTYPE = default_type> inline std::unique_ptr<IO::ReadDataWrapper<DatasetField, VTYPE>> readName() const; inline std::shared_ptr<IO::BaseRecordingData> writeName(bool reset = false); "
1920
PREDEFINED += "DEFINE_REGISTERED_FIELD(name, registeredType, fieldPath, description)=/** description */ template<typename RTYPE = registeredType> inline std::shared_ptr<RTYPE> name() const;"
2021
PREDEFINED += "DEFINE_REFERENCED_REGISTERED_FIELD(name, registeredType, fieldPath, description)=/** description */ template<typename RTYPE = registeredType> inline std::shared_ptr<RTYPE> name() const;"
2122
PREDEFINED += "REGISTER_SUBCLASS_WITH_TYPENAME(className, namespaceName, typeName)=REGISTER_SUBCLASS_WITH_TYPENAME(className, namespaceName, typeName);"

docs/pages/2_devdocs.dox

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
* - \subpage registered_type_page
1111
* - \subpage integrating_extensions_page
1212
* - \subpage read_design_page
13+
* - \subpage record_design_page
1314
* - \subpage legal_page
1415
*
1516
*/

docs/pages/devdocs/integrating_extensions.dox

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@
131131
* with the \ref AQNWB::NWB::RegisteredType "RegisteredType" type registry.
132132
* - \ref REGISTER_SUBCLASS must appear in the header file
133133
* - \ref REGISTER_SUBCLASS_IMPL must appear in the cpp file
134-
* 3. **Field Definitions**: The \ref DEFINE_FIELD and \ref DEFINE_REGISTERED_FIELD macros simplify
134+
* 3. **Field Definitions**: The \ref DEFINE_DATASET_FIELD, \ref DEFINE_ATTRIBUTE_FIELD, and \ref DEFINE_REGISTERED_FIELD macros simplify
135135
* reading known fields from the file.
136136
* 4. **Schema Registration**: The generated header file automatically registers the namespace with
137137
* the \ref AQNWB::SPEC::NamespaceRegistry "NamespaceRegistry" using the \ref REGISTER_NAMESPACE macro.

docs/pages/devdocs/read_design.dox

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,9 @@
2929
* The \ref AQNWB::IO::ReadDataWrapper "ReadDataWrapper" then calls the I/O backend to retrieve data lazily
3030
* when the user requests access.
3131
* - To create a \ref AQNWB::IO::ReadDataWrapper "ReadDataWrapper" object a user will typically
32-
* use either pre-definied read methods created via the \ref DEFINE_FIELD macro
33-
* (see also \ref use_the_define_field_macro) or the \ref AQNWB::NWB::RegisteredType::readField "RegisteredType::readField"
34-
* method.
32+
* use either pre-definied read methods created via the \ref DEFINE_ATTRIBUTE_FIELD or \ref DEFINE_DATASET_FIELD macros
33+
* (see also \ref use_the_define_attribute_field_macro and \ref use_the_define_dataset_field_macro) or
34+
* the \ref AQNWB::NWB::RegisteredType::readField "RegisteredType::readField" method.
3535
* 3. \ref AQNWB::IO::BaseIO "BaseIO"
3636
* - \ref AQNWB::IO::BaseIO "BaseIO", \ref AQNWB::IO::HDF5::HDF5IO "HDF5IO" is then responsible for
3737
* reading data from disk and allocating memory for data on read. Read methods,
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
/**
2+
* \page record_design_page Implementation of Data Recording
3+
*
4+
* \tableofcontents
5+
*
6+
* This page focuses on the software architecture of AqNWB for implementing data recording
7+
* and is mainly aimed at software developers. The recording system in AqNWB is built around
8+
* several key concepts:
9+
*
10+
* 1. **Efficient data recording for individual datasets** via \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" objects
11+
* discussed in \ref record_design_sec_recording_data
12+
* 2. **Consistent multi-dataset recording through convenience methods** defined on individual \ref AQNWB::NWB::RegisteredType "RegisteredType"
13+
* objects (e.g., \ref AQNWB::NWB::TimeSeries::writeData "TimeSeries::writeData") discussed in \ref record_design_sec_timeseries
14+
* 3. **Managing collections of recording objects through RecordingContainers**, discussed in \ref record_design_sec_recording_containers
15+
*
16+
* \section record_design_sec_recording_data Recording datasets with BaseRecordingData
17+
*
18+
* AqNWB records datasets efficiently via \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" objects. The main components involved in
19+
* writing data to an NWB file via AqNWB are:
20+
*
21+
* 1. \ref DEFINE_DATASET_FIELD Macro
22+
* - The \ref DEFINE_DATASET_FIELD macro not only defines methods for reading datasets for a particular neurodata_type (as described in \ref read_design_page),
23+
* but also defines methods for retrieving \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" objects that are used for recording to individual datasets.
24+
* For each dataset field defined with this macro, a corresponding method is generated that returns a \ref AQNWB::IO::BaseRecordingData "BaseRecordingData"
25+
* object configured for that specific dataset. The \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" objects are then cached by
26+
* \ref AQNWB::NWB::RegisteredType "RegisteredType" as described below.
27+
*
28+
* 2. \ref AQNWB::IO::BaseRecordingData "BaseRecordingData"
29+
* - \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" is a class that manages the recording process for a dataset.
30+
* - It keeps track of the current position in the dataset where data should be written next via the `m_position` member.
31+
* - It provides methods for writing data blocks to the dataset, such as \ref AQNWB::IO::BaseRecordingData::writeDataBlock "writeDataBlock",
32+
* which can handle different data types and dimensions.
33+
*
34+
* 3. \ref AQNWB::NWB::RegisteredType "RegisteredType"
35+
* - \ref AQNWB::NWB::RegisteredType "RegisteredType" maintains a cache of \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" objects via the `m_recordingDataCache` member.
36+
* This cache allows reusing the same \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" object when it is requested multiple times,
37+
* improving performance and retaining the recording position. The cache is essential for writing data to the dataset in a streaming fashion,
38+
* as it ensures that each write continues from where the previous write left off. The cache also avoids the need for manually maintaining
39+
* the objects and allows caching of an arbitrary number of \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" object such that the
40+
* individual neurodata_type classes do not need to worry about maintaining their recording state.
41+
*
42+
* 4. \ref AQNWB::IO::BaseIO "BaseIO"
43+
* - \ref AQNWB::IO::BaseIO "BaseIO" and its implementations (e.g., \ref AQNWB::IO::HDF5::HDF5IO "HDF5IO") are responsible for
44+
* the actual writing of data to disk. They provide methods for creating datasets (e.g., \ref AQNWB::IO::BaseIO::createArrayDataSet "createArrayDataSet")
45+
* and getting existing datasets (\ref AQNWB::IO::BaseIO::getDataSet "getDataSet"), both of which return
46+
* \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" objects.
47+
*
48+
* @dot
49+
* digraph G {
50+
* node [shape=none];
51+
*
52+
* HDF5IO [
53+
* label=<
54+
* <table border="0" cellborder="1" cellspacing="0">
55+
* <tr><td colspan="2" bgcolor="lightgray"><b>HDF5IO</b></td></tr>
56+
* <tr><td colspan="2" bgcolor="lightgray"><b>Functions</b></td></tr>
57+
* <tr><td align="left">+ createArrayDataSet(): BaseRecordingData</td></tr>
58+
* <tr><td align="left">+ getDataSet(): BaseRecordingData</td></tr>
59+
* <tr><td colspan="2" bgcolor="lightgray"><b>Attributes</b></td></tr>
60+
* </table>
61+
* >
62+
* ];
63+
*
64+
* NWBFile [
65+
* shape=note,
66+
* label="NWB file (HDF5)"
67+
* ];
68+
*
69+
* BaseRecordingData [
70+
* label=<
71+
* <table border="0" cellborder="1" cellspacing="0">
72+
* <tr><td colspan="2" bgcolor="lightgray"><b>BaseRecordingData</b></td></tr>
73+
* <tr><td colspan="2" bgcolor="lightgray"><b>Functions</b></td></tr>
74+
* <tr><td align="left">+ writeDataBlock(): Status</td></tr>
75+
* <tr><td align="left">+ getPosition(): std::vector&lt;SizeType&gt;</td></tr>
76+
* <tr><td colspan="2" bgcolor="lightgray"><b>Attributes</b></td></tr>
77+
* <tr><td align="left">+ m_position: std::vector&lt;SizeType&gt;</td></tr>
78+
* <tr><td align="left">+ m_shape: std::vector&lt;SizeType&gt;</td></tr>
79+
* </table>
80+
* >
81+
* ];
82+
*
83+
* RegisteredType [
84+
* label=<
85+
* <table border="0" cellborder="1" cellspacing="0">
86+
* <tr><td colspan="2" bgcolor="lightgray"><b>RegisteredType</b></td></tr>
87+
* <tr><td colspan="2" bgcolor="lightgray"><b>Functions</b></td></tr>
88+
* <tr><td align="left">+ clearRecordingDataCache(): void</td></tr>
89+
* <tr><td colspan="2" bgcolor="lightgray"><b>Attributes</b></td></tr>
90+
* <tr><td align="left">+ m_recordingDataCache: std::unordered_map</td></tr>
91+
* <tr><td align="left">+ m_io: std::shared_ptr&lt;BaseIO&gt;</td></tr>
92+
* <tr><td align="left">+ m_path: std::string</td></tr>
93+
* </table>
94+
* >
95+
* ];
96+
*
97+
* Container [
98+
* label=<
99+
* <table border="0" cellborder="1" cellspacing="0">
100+
* <tr><td colspan="2" bgcolor="lightgray"><b>Container</b></td></tr>
101+
* <tr><td colspan="2" bgcolor="lightgray"><b>Functions</b></td></tr>
102+
* <tr><td align="left">+ writeDataField(): BaseRecordingData</td></tr>
103+
* </table>
104+
* >
105+
* ];
106+
*
107+
* { rank=same; RegisteredType; }
108+
* { rank=same; Container; }
109+
* { rank=same; BaseRecordingData; }
110+
* { rank=same; HDF5IO; }
111+
* { rank=same; NWBFile; }
112+
*
113+
* RegisteredType -> Container [arrowhead=empty, style=dashed];
114+
* Container -> BaseRecordingData [label="created by functions defined via \nDEFINE_DATASET_FIELD"];
115+
* RegisteredType -> BaseRecordingData [label="caches"];
116+
* BaseRecordingData -> HDF5IO [label="uses for\nwriting data"];
117+
* HDF5IO -> NWBFile [label="write data"];
118+
* }
119+
* @enddot
120+
*
121+
* \subsection record_design_sec_define_dataset_field The DEFINE_DATASET_FIELD Macro for Recording
122+
*
123+
* The \ref DEFINE_DATASET_FIELD macro not only defines methods for reading datasets but also for recording to them.
124+
* For each dataset field defined with this macro, a corresponding method is generated that returns a \ref AQNWB::IO::BaseRecordingData "BaseRecordingData"
125+
* object configured for that specific dataset.
126+
*
127+
* For example, if we have a \ref AQNWB::NWB::TimeSeries "TimeSeries" class with a 'data' field defined using the \ref DEFINE_DATASET_FIELD macro:
128+
*
129+
* \code{.cpp}
130+
* DEFINE_DATASET_FIELD(readData, recordData, std::any, "data", The main data)
131+
* \endcode
132+
*
133+
* This generates not only a `readData()` method for reading the dataset but also a `recordData()` method
134+
* that returns a \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" object configured for writing to the 'data' dataset.
135+
*
136+
* The generated `recordData()` method:
137+
* 1. Checks if a \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" object for the dataset already exists in the cache
138+
* 2. If it exists and `reset` is false, returns the cached object
139+
* 3. If it doesn't exist or `reset` is true, gets a new \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" object from the IO backend
140+
* 4. Caches the new object and returns it
141+
*
142+
* This caching mechanism is crucial for maintaining the recording state across multiple writes to the same dataset.
143+
*
144+
* \subsection record_design_sec_baserecordingdata BaseRecordingData for Managing Recording
145+
*
146+
* The \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" class is responsible for managing the recording process
147+
* for a dataset. It keeps track of the current position in the dataset where data should be written next, ensuring
148+
* that data is written efficiently, especially for streaming data where multiple writes occur over time.
149+
*
150+
* Key features of \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" include:
151+
*
152+
* - **Position Tracking**: \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" keeps track of the current position in the dataset via the `m_position` member.
153+
* This is particularly important for streaming data, where data is written in chunks over time.
154+
*
155+
* - **Data Type Handling**: \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" can handle different data types and dimensions through its `writeDataBlock` methods,
156+
* making it flexible for various types of data.
157+
*
158+
* \section record_design_sec_timeseries TimeSeries Convenience Methods for Consistent Recording
159+
*
160+
* Specific types like \ref AQNWB::NWB::TimeSeries "TimeSeries" provide convenience methods for writing multiple datasets
161+
* in a consistent manner. This ensures that related datasets (e.g., 'data' and 'timestamps' in a \ref AQNWB::NWB::TimeSeries "TimeSeries") are
162+
* written consistently and simplifies the recording process.
163+
*
164+
* The \ref AQNWB::NWB::TimeSeries "TimeSeries" class provides:
165+
*
166+
* - An \ref AQNWB::NWB::TimeSeries::initialize "initialize" method that sets up all the necessary datasets and attributes
167+
* for a time series, including `data`, `timestamps`, `control` and all their attributes, e.g., `unit`
168+
* - A \ref AQNWB::NWB::TimeSeries::writeData "writeData" method that writes `data`, `timestamps`, and `control`
169+
* information in a single call, ensuring consistency between these related datasets.
170+
*
171+
* These convenience methods handle the details of:
172+
*
173+
* - **Dataset Creation**: Creating the necessary datasets if they don't exist.
174+
* - **Data Alignment**: Ensuring that related datasets (e.g., data and timestamps) are properly aligned.
175+
* - **Position Management**: Managing the current position in each dataset to ensure consistent writing.
176+
* - **Error Handling**: Handling errors that might occur during the writing process.
177+
*
178+
* \section record_design_sec_recording_containers RecordingContainers for Managing Collections
179+
*
180+
* \ref AQNWB::NWB::RecordingContainers "RecordingContainers" provides an additional convenience layer for managing
181+
* collections of \ref AQNWB::NWB::RegisteredType "RegisteredType" Containers used for recording. This is particularly
182+
* useful when recording data to multiple related containers, such as multiple \ref AQNWB::NWB::TimeSeries "TimeSeries" objects.
183+
*
184+
* \ref AQNWB::NWB::RecordingContainers "RecordingContainers" simplifies the process of:
185+
*
186+
* - **Container Management**: Adding and retrieving containers from the collection via `addContainer` and `getContainer` methods.
187+
* - **Coordinated Recording**: Coordinating the recording process across multiple containers through specialized methods like:
188+
* - `writeTimeseriesData`: For writing data to a \ref AQNWB::NWB::TimeSeries "TimeSeries" container
189+
* - `writeElectricalSeriesData`: For writing data to an ElectricalSeries container
190+
* - `writeSpikeEventData`: For writing data to a SpikeEventSeries container
191+
* - `writeAnnotationSeriesData`: For writing data to an AnnotationSeries container
192+
* - **Error Handling**: Handling errors that might occur during the recording process across multiple containers.
193+
*
194+
* \section recording_design_further_reading Further Reading
195+
*
196+
* - \ref workflow provides a step-by-step overview of the typical recording process.
197+
* - \ref read_design_page provides a complementary overview of how data read is implemented, which involves
198+
* many of the same classes, but using \ref AQNWB::IO::ReadDataWrapper "ReadDataWrapper" instead of
199+
* \ref AQNWB::IO::BaseRecordingData "BaseRecordingData" for accessing data.
200+
* - \ref registered_type_page discusess the use of \ref AQNWB::NWB::RegisteredType "RegisteredType" to
201+
* implement writing and reading of neurodata_types for NWB.
202+
*
203+
*/

0 commit comments

Comments
 (0)