From 9ba9323d3da17c238665a9b165dc8c5b43bd0b90 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 30 Jun 2025 21:29:25 +0200 Subject: [PATCH 01/67] Update file_read.rst --- .../pages/getting_started/file_read.rst | 29 ++++++++++++++----- 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/docs/source/pages/getting_started/file_read.rst b/docs/source/pages/getting_started/file_read.rst index a3aa4616d..cc690a2f1 100644 --- a/docs/source/pages/getting_started/file_read.rst +++ b/docs/source/pages/getting_started/file_read.rst @@ -1,21 +1,34 @@ -Reading with MatNWB -=================== +Reading NWB Files +================= -For most files, MatNWB only requires the :func:`nwbRead` call: +This section provides an overview of reading and exploring NWB (Neurodata Without Borders) files with MatNWB. It serves as a reference guide to the data objects you’ll encounter when working with NWB files. For detailed code examples and usage demonstrations, please refer to the :doc:`tutorials <../tutorials/index>`. + +To read an NWB file, use the :func:`nwbRead` function: .. code-block:: MATLAB - nwb = nwbRead('path/to/filename.nwb'); + nwb = nwbRead('path/to/file.nwb'); + +This command performs several important tasks behind the scenes: + +1. **Opens the file** and reads its structure +2. **Automatically generates MATLAB classes** needed to work with the data +3. **Returns an NwbFile object** representing the entire file + +The returned `NwbFile` object is the primary access point for all the data in the file. In the :ref:`next section`, we will examine the structure of this object in detail, covering how to explore it using standard MATLAB dot notation to access experimental metadata, raw recordings, processed data, and analysis results, as well as how to search for specific data types. -This call will read the file, create the necessary NWB schema class files, as well as any extension schemata that is needed for the file itself. This is because both PyNWB and MatNWB embed a copy of the schema environment into the NWB file when it is written. +.. note:: + The :func:`nwbRead` function currently does not support reading NWB files stored in Zarr format. +**Next steps** -The returned object above is an :class:`NwbFile` object which serves as the root object with which you can use to browse the contents of the file. More detail about the NwbFile class can be found here: :ref:`matnwb-read-nwbfile-intro`. +The following pages provide detailed information on specific aspects of reading NWB files: .. toctree:: - :maxdepth: 2 + :maxdepth: 1 file_read/nwbfile file_read/dynamictable file_read/untyped - file_read/troubleshooting \ No newline at end of file + file_read/schemas_and_generation + file_read/troubleshooting From 211f73760650de531f330e42b06e6cf59d877466 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 30 Jun 2025 21:29:48 +0200 Subject: [PATCH 02/67] Update nwbfile.rst --- .../getting_started/file_read/nwbfile.rst | 270 +++++++++++++++++- 1 file changed, 255 insertions(+), 15 deletions(-) diff --git a/docs/source/pages/getting_started/file_read/nwbfile.rst b/docs/source/pages/getting_started/file_read/nwbfile.rst index 106abca01..239e97af0 100644 --- a/docs/source/pages/getting_started/file_read/nwbfile.rst +++ b/docs/source/pages/getting_started/file_read/nwbfile.rst @@ -1,31 +1,271 @@ .. _matnwb-read-nwbfile-intro: -Using the NwbFile Class ------------------------ +Working with the NwbFile Object +=============================== + +When you read an NWB file with ``nwbRead``, you get back an :class:`NwbFile` object that serves as the main interface to all the data in the file. + +NwbFile Example +--------------- + +For illustration, we'll run the ecephys tutorial and read the resulting NWB file: + +.. code-block:: MATLAB + + evalc("run('tutorials/ecephys.mlx')"); % Run tutorial with suppressed output + nwb = nwbRead('tutorials/ecephys_tutorial.nwb'); + disp(nwb) + +.. code-block:: text + + + NwbFile with properties: + + nwb_version: '2.8.0' + file_create_date: [1×1 types.untyped.DataStub] + identifier: 'Mouse5_Day3' + session_description: 'mouse in open exploration' + session_start_time: [1×1 types.untyped.DataStub] + timestamps_reference_time: [1×1 types.untyped.DataStub] + acquisition: [2×1 types.untyped.Set] + analysis: [0×1 types.untyped.Set] + general: [0×1 types.untyped.Set] + general_data_collection: '' + general_devices: [1×1 types.untyped.Set] + general_experiment_description: '' + general_experimenter: [1×1 types.untyped.DataStub] + general_extracellular_ephys: [4×1 types.untyped.Set] + general_extracellular_ephys_electrodes: [1×1 types.hdmf_common.DynamicTable] + general_institution: 'University of My Institution' + general_intracellular_ephys: [0×1 types.untyped.Set] + general_intracellular_ephys_experimental_conditions: [] + general_intracellular_ephys_filtering: '' + general_intracellular_ephys_intracellular_recordings: [] + general_intracellular_ephys_repetitions: [] + general_intracellular_ephys_sequential_recordings: [] + general_intracellular_ephys_simultaneous_recordings: [] + general_intracellular_ephys_sweep_table: [] + general_keywords: '' + general_lab: '' + general_notes: '' + general_optogenetics: [0×1 types.untyped.Set] + general_optophysiology: [0×1 types.untyped.Set] + general_pharmacology: '' + general_protocol: '' + general_related_publications: [1×1 types.untyped.DataStub] + general_session_id: 'session_1234' + general_slices: '' + general_source_script: '' + general_source_script_file_name: '' + general_stimulus: '' + general_subject: [] + general_surgery: '' + general_virus: '' + general_was_generated_by: [1×1 types.untyped.DataStub] + intervals: [0×1 types.untyped.Set] + intervals_epochs: [] + intervals_invalid_times: [] + intervals_trials: [] + processing: [1×1 types.untyped.Set] + scratch: [0×1 types.untyped.Set] + stimulus_presentation: [0×1 types.untyped.Set] + stimulus_templates: [0×1 types.untyped.Set] + units: [1×1 types.core.Units] + >> + +This object contains properties that represent the contents of the NWB file, including metadata about the experiment and data containers for raw and processed data. The object is hierarchical, meaning you can access nested data using dot notation. + +For an overview of the NWB file structure, see the `NWB File Structure `_ section of the central +`NWB Documentation `_, or for technical details, refer to the `NWB Format Specification `_. + +One key difference between the :class:`NwbFile` object and the formal NWB structure is that some top-level groups, like ``general``, ``intervals`` and ``stimulus`` are flattened into top level properties of the :class:`NwbFile` object. This is only a convenience for easier access, and does not change the underlying structure of the NWB file. + +Basic Navigation +---------------- + +We can explore an :class:`NwbFile` object just like any MATLAB structure. For example, to see the session description: + +.. code-block:: MATLAB + + disp(nwb.session_description); + +.. code-block:: text -The :class:`NwbFile` class represents the root object for the NWB file and consists of properties and values which map indirectly to the internal HDF5 dataset. + mouse in open exploration + >> + +Display the raw data of the file: + +.. code-block:: MATLAB + + >> disp(nwb.acquisition); + +.. code-block:: text + + 2×1 Set array with properties: + + ElectricalSeries: [types.core.ElectricalSeries] + SpikeEvents_Shank0: [types.core.SpikeEventSeries] + >> + +The acquistion property contains a :class:`types.untyped.Set` object, which is a dynamic collection of NWB objects. In this case, it contains two datasets: ``ElectricalSeries`` and ``SpikeEvents_Shank0``. + +To access a specific dataset, we can use the :meth:`Set.get` method: + +.. code-block:: MATLAB + + >> disp(nwb.acquisition.get('ElectricalSeries')); + +.. code-block:: text + + ElectricalSeries with properties: + + channel_conversion_axis: 1 + electrodes: [1×1 types.hdmf_common.DynamicTableRegion] + channel_conversion: [] + filtering: '' + starting_time_unit: 'seconds' + timestamps_interval: 1 + timestamps_unit: 'seconds' + data: [1×1 types.untyped.DataStub] + data_unit: 'volts' + comments: 'no comments' + control: [] + control_description: '' + data_continuity: '' + data_conversion: 1 + data_offset: 0 + data_resolution: -1 + description: 'no description' + starting_time: 0 + starting_time_rate: 30000 + timestamps: [] + >> + + +Data Types in NWB Files +----------------------- -.. image:: https://github.com/NeurodataWithoutBorders/nwb-overview/blob/main/docs/source/img/matnwb_NwbFile.png?raw=true +There are 3 primary data types you will encounter when working with NWB files: -In most cases, the types contained in these files were generated by the embedded NWB schema in the file (or separately if you opted to generate them separately). These types can be traversed using regular MATLAB syntax for accessing properties and their values. +- MATLAB fundamental classes (e.g., ``char``, ``numeric``, ``cell``) +- NWB schema-defined types (e.g., :class:`types.core.TimeSeries`, :class:`types.core.ElectricalSeries`, :class:`types.hdmf_common.DynamicTable`) +- :ref:`Utility types` (e.g., ``types.untyped.Set``, ``types.untyped.DataStub``) -Aside from the generated Core and Extension types, there are "Untyped" utility Types which are covered in greater detail in :ref:`matnwb-read-untyped-intro`. +TODO: Briefly discuss schema and utility types. .. _matnwb-read-nwbfile-searchfor: -Searching by Type -~~~~~~~~~~~~~~~~~ +Finding Data: The searchFor Method +---------------------------------- -The NwbFile also allows for searching the entire NWB file by type using it's :meth:`NwbFile.searchFor` method. +When working with complex NWB files, manually exploring every property can be time-consuming. The :meth:`NwbFile.searchFor` method lets you search for specific types of data across the entire file: -You can search for only the class name: +.. code-block:: MATLAB -.. image:: https://github.com/NeurodataWithoutBorders/nwb-overview/blob/main/docs/source/img/matnwb_searchForExample.png?raw=true + electricalseries_map = nwb.searchFor('ElectricalSeries') -Or use the ``'includeSubClasses'`` optional argument to search all subclasses: +.. code-block:: output -.. image:: https://github.com/NeurodataWithoutBorders/nwb-overview/blob/main/docs/source/img/matnwb_searchForExample-withSubclassing.png?raw=true + electricalseries_map = + + Map with properties: + + Count: 3 + KeyType: char + ValueType: any + >> -.. note:: +The ``searchFor`` method returns a MATLAB ``containers.Map`` object where: + +- **Keys** are the paths (within the file) to each found object +- **Values** are the actual data objects + +.. code-block:: MATLAB + + % See what was found + paths = electricalseries_map.keys(); % Cell array of paths + objects = electricalseries_map.values(); % Cell array of objects + + % Display the paths + for i = 1:length(paths) + fprintf('Found %s at: %s\n', class(objects{i}), paths{i}); + end + +.. code-block:: text + + Found types.core.ElectricalSeries at: /acquisition/ElectricalSeries + Found types.core.ElectricalSeries at: /processing/ecephys/nwbdatainterface/FilteredEphys/electricalseries/FilteredElectricalSeries + Found types.core.ElectricalSeries at: /processing/ecephys/nwbdatainterface/LFP/electricalseries/ElectricalSeries + >> + +**Including Subclasses:** + +Some searches benefit from including related data types. Use the ``'includeSubClasses'`` option: + +.. code-block:: MATLAB + + % Find all types of time series (including specialized ones) + all_timeseries = nwb.searchFor('TimeSeries', 'includeSubClasses'); + disp(all_timeseries.values') + +.. code-block:: text + + {1×1 types.core.ElectricalSeries } + {1×1 types.core.SpikeEventSeries } + {1×1 types.core.ElectricalSeries } + {1×1 types.core.ElectricalSeries } + {1×1 types.core.DecompositionSeries} + + >> + + +This is useful because many NWB data types are specialized versions of more general types. + +Retrieving Found Objects: The resolve Method +--------------------------------------------- + +Once you've found data using ``searchFor``, you can retrieve specific objects either directly from the values of the ``containers.Map`` object or using their paths with the :meth:`NwbFile.resolve` method: + +.. code-block:: MATLAB + + all_electricalseries_paths = electricalseries_map.keys(); % Cell array of paths + first_path = all_electricalseries_paths{1}; + + % Retrieve the object using its path + electricalseries_obj = nwb.resolve(first_path); + +The ``resolve`` method is particularly useful when you: + +- Want to access objects found through ``searchFor`` +- Have a specific path and want to retrieve the object + +Working with the Data +--------------------- + +Once you have a data object (whether found through navigation, search, or resolve), you can access its contents: + +.. code-block:: MATLAB + + % Most data objects have a .data property + raw_data = electricalseries_obj.data.load(); + size(raw_data) + + % Check for additional metadata + fprintf('Description: %s\n', electricalseries_obj.description); + +.. code-block:: text + + ans = + + 12 3000 + + Description: no description + >> + +Remember that data is not loaded into memory until you call ``.load()``. This allows you to work with very large files without overwhelming system memory. See the section on :ref:`matnwb-read-untyped-datastub-datapipe` for more information. + +The Connection to HDF5 +----------------------- - As seen above, the keys of the Map returned by the :meth:`NwbFile.searchFor` method can be paired with the :meth:`NwbFile.resolve` method to effectively retrieve an object from any NwbFile. This is also true for internal HDF5 paths. +Under the hood, NWB files are stored in HDF5 format, which is why you see path-like structures (e.g., ``/acquisition/ElectricalSeries``). However, the NwbFile object abstracts away most of the HDF5 complexity, allowing you to work with the data using familiar MATLAB syntax. From 299ee8e2a188b0add578c5813d2def6fc43af3b5 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 30 Jun 2025 21:30:39 +0200 Subject: [PATCH 03/67] Create schemas_and_generation.rst --- .../file_read/schemas_and_generation.rst | 248 ++++++++++++++++++ 1 file changed, 248 insertions(+) create mode 100644 docs/source/pages/getting_started/file_read/schemas_and_generation.rst diff --git a/docs/source/pages/getting_started/file_read/schemas_and_generation.rst b/docs/source/pages/getting_started/file_read/schemas_and_generation.rst new file mode 100644 index 000000000..d255abba9 --- /dev/null +++ b/docs/source/pages/getting_started/file_read/schemas_and_generation.rst @@ -0,0 +1,248 @@ +.. _matnwb-read-schemas-generation: + +Schemas and Class Generation +============================ + +This page covers the advanced concepts behind how MatNWB works with NWB schemas and generates MATLAB classes. Understanding these concepts can help you troubleshoot issues and work with custom extensions. + +What are NWB Schemas? +--------------------- + +NWB schemas are formal specifications that define: + +- **Data types** and their properties +- **Relationships** between different data types +- **Validation rules** for data integrity +- **File organization** standards + +Think of schemas as blueprints that ensure all NWB files follow the same organizational principles, regardless of who created them or what software was used. + +Schema Versions +~~~~~~~~~~~~~~~ + +NWB schemas evolve over time to add new features and fix issues. Each version is identified by a number (e.g., "2.6.0", "2.7.0"). When you read an NWB file, MatNWB automatically detects which schema version was used to create it. + +You can check a file's schema version: + +.. code-block:: MATLAB + + version = util.getSchemaVersion('path/to/file.nwb'); + fprintf('File uses NWB schema version: %s\n', version); + +How MatNWB Generates Classes +---------------------------- + +When you call ``nwbRead``, MatNWB performs several steps behind the scenes: + +1. **Reads the file's embedded schema** information +2. **Generates MATLAB classes** that correspond to the data types in the file +3. **Creates an object hierarchy** that matches the file's structure + +This process ensures that the MATLAB objects you work with accurately represent the standardized NWB data types. + +Embedded vs. External Schemas +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Embedded Schemas** (most common): +Modern NWB files contain their schema information embedded within the file itself. This makes the files self-contained and ensures compatibility. + +**External Schemas** (older files): +Some older NWB files don't contain embedded schemas. For these files, you need to generate the appropriate classes manually before reading. + +Automatic Class Generation +--------------------------- + +For files with embedded schemas, MatNWB handles class generation automatically: + +.. code-block:: MATLAB + + % This automatically generates classes as needed + nwb = nwbRead('modern_file.nwb'); + +The generated classes are saved in the MatNWB installation directory and reused for subsequent reads of files with the same schema. + +Manual Class Generation +----------------------- + +For older files or when working with specific schema versions, you may need to generate classes manually. + +Generating Core Classes +~~~~~~~~~~~~~~~~~~~~~~~ + +Use :func:`generateCore` to create classes for the core NWB schema: + +.. code-block:: MATLAB + + % Generate classes for the latest NWB version + generateCore(); + + % Generate classes for a specific version + generateCore('2.6.0'); + + % Generate classes in a custom directory + generateCore('savedir', '/path/to/custom/directory'); + +Generating Extension Classes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If a file uses custom extensions, use :func:`generateExtension`: + +.. code-block:: MATLAB + + % Generate classes for a custom extension + generateExtension('/path/to/extension.namespace.yaml'); + + % Generate multiple extensions + generateExtension('ext1.namespace.yaml', 'ext2.namespace.yaml'); + +Reading Files Without Regeneration +----------------------------------- + +If you're reading multiple files with the same schema, you can skip class regeneration for faster loading: + +.. code-block:: MATLAB + + % Skip automatic class generation + nwb = nwbRead('file.nwb', 'ignorecache'); + +This is useful when: + +- Reading many files from the same experiment +- You know the classes are already generated and current +- You want faster file loading + +.. warning:: + Using 'ignorecache' with files that have different schemas than your generated classes can cause errors or incorrect data interpretation. + +Custom Save Directories +------------------------ + +By default, MatNWB saves generated classes in its installation directory. You can specify a custom location: + +.. code-block:: MATLAB + + % Generate classes in current working directory + nwb = nwbRead('file.nwb', 'savedir', '.'); + + % Generate classes in a specific directory + nwb = nwbRead('file.nwb', 'savedir', '/path/to/classes'); + +This is useful when: + +- Working with multiple schema versions simultaneously +- You don't have write permissions to the MatNWB installation directory +- You want to keep different projects' classes separate + +Understanding Class Files +-------------------------- + +Generated classes are saved as MATLAB .m files in a ``+types`` package directory structure: + +.. code-block:: text + + +types/ + ├── +core/ % Core NWB types + │ ├── TimeSeries.m + │ ├── ElectricalSeries.m + │ └── ... + ├── +hdmf_common/ % Common HDMF types + │ ├── DynamicTable.m + │ └── ... + └── +extension_name/ % Custom extension types + └── CustomType.m + +These classes define the properties and methods for each NWB data type, enabling the object-oriented interface you use when working with NWB data. + +Schema Validation +----------------- + +MatNWB validates that the embedded schemas in a file match the generated classes. If there's a mismatch, you may see warnings or errors suggesting: + +- Regenerating classes for the file's schema version +- Using ``generateCore`` with the correct version +- Checking for schema version conflicts + +Working with Multiple Schema Versions +-------------------------------------- + +When working with files from different NWB versions or with different extensions, consider these strategies: + +**Separate Directories:** + +.. code-block:: MATLAB + + % Generate classes for different versions in separate directories + generateCore('2.6.0', 'savedir', 'nwb_2_6_0_classes'); + generateCore('2.7.0', 'savedir', 'nwb_2_7_0_classes'); + + % Add the appropriate directory to your path before reading + addpath('nwb_2_6_0_classes'); + nwb_old = nwbRead('old_file.nwb', 'ignorecache'); + +**Project-Specific Classes:** + +.. code-block:: MATLAB + + % Generate classes in your project directory + project_dir = '/path/to/my/project'; + generateCore('savedir', project_dir); + generateExtension('my_extension.yaml', 'savedir', project_dir); + + % Read files using project-specific classes + nwb = nwbRead('project_file.nwb', 'savedir', project_dir); + +Troubleshooting Schema Issues +----------------------------- + +**Version Conflicts:** + +If you see errors about incompatible classes or missing properties: + +.. code-block:: MATLAB + + % Check the file's schema version + file_version = util.getSchemaVersion('problematic_file.nwb'); + + % Generate classes for that specific version + generateCore(file_version); + + % Try reading again + nwb = nwbRead('problematic_file.nwb'); + +**Missing Extensions:** + +If a file uses custom extensions you don't have: + +.. code-block:: MATLAB + + % Let MatNWB generate from embedded schemas + nwb = nwbRead('file_with_extensions.nwb'); + + % Or generate the extension manually if you have the schema file + generateExtension('/path/to/extension.namespace.yaml'); + +**Class Path Issues:** + +If MATLAB can't find the generated classes: + +.. code-block:: MATLAB + + % Check if the types directory is on your path + which types.core.TimeSeries + + % Add the directory containing +types to your path + addpath('/path/to/directory/containing/types'); + + % Refresh MATLAB's function cache + rehash; + +Best Practices +-------------- + +1. **Let MatNWB handle schema generation automatically** when possible +2. **Use 'ignorecache' only when you're sure about schema compatibility** +3. **Keep different schema versions in separate directories** if working with multiple versions +4. **Check schema versions** when troubleshooting read errors +5. **Use custom save directories** for project-specific work + +Understanding these schema concepts will help you work more confidently with NWB files and troubleshoot issues when they arise. For most users, the automatic schema handling in ``nwbRead`` will be sufficient, but these advanced features provide flexibility for complex workflows. From aba487431758894ddd8059c6173f9eba40e757dd Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 30 Jun 2025 22:11:27 +0200 Subject: [PATCH 04/67] Update schemas_and_generation.rst Fixed subsections about working with multiple schema versions --- .../file_read/schemas_and_generation.rst | 100 ++++++++++++------ 1 file changed, 69 insertions(+), 31 deletions(-) diff --git a/docs/source/pages/getting_started/file_read/schemas_and_generation.rst b/docs/source/pages/getting_started/file_read/schemas_and_generation.rst index d255abba9..327371ed3 100644 --- a/docs/source/pages/getting_started/file_read/schemas_and_generation.rst +++ b/docs/source/pages/getting_started/file_read/schemas_and_generation.rst @@ -3,7 +3,7 @@ Schemas and Class Generation ============================ -This page covers the advanced concepts behind how MatNWB works with NWB schemas and generates MATLAB classes. Understanding these concepts can help you troubleshoot issues and work with custom extensions. +This page covers the advanced concepts behind how MatNWB works with NWB schemas and generates MATLAB classes representing neurodata types. Understanding these concepts can help you troubleshoot issues and work with custom extensions. What are NWB Schemas? --------------------- @@ -35,16 +35,16 @@ How MatNWB Generates Classes When you call ``nwbRead``, MatNWB performs several steps behind the scenes: 1. **Reads the file's embedded schema** information -2. **Generates MATLAB classes** that correspond to the data types in the file +2. **Generates MATLAB classes** for neurodata types defined by the schema version used to create the file 3. **Creates an object hierarchy** that matches the file's structure -This process ensures that the MATLAB objects you work with accurately represent the standardized NWB data types. +This process ensures that the MATLAB objects you work with accurately reflects the exact schemas used to generate the file. Embedded vs. External Schemas ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Embedded Schemas** (most common): -Modern NWB files contain their schema information embedded within the file itself. This makes the files self-contained and ensures compatibility. +Newer (v2.x.x) NWB files contain their schema information embedded within the file itself. This makes the files self-contained and ensures compatibility. **External Schemas** (older files): Some older NWB files don't contain embedded schemas. For these files, you need to generate the appropriate classes manually before reading. @@ -57,9 +57,9 @@ For files with embedded schemas, MatNWB handles class generation automatically: .. code-block:: MATLAB % This automatically generates classes as needed - nwb = nwbRead('modern_file.nwb'); + nwb = nwbRead('newer_file.nwb'); -The generated classes are saved in the MatNWB installation directory and reused for subsequent reads of files with the same schema. +The generated classes are saved in the MatNWB installation directory and can be reused for subsequent reads of files with the same schema. Manual Class Generation ----------------------- @@ -127,11 +127,40 @@ By default, MatNWB saves generated classes in its installation directory. You ca % Generate classes in a specific directory nwb = nwbRead('file.nwb', 'savedir', '/path/to/classes'); -This is useful when: +This is useful for several advanced use cases: + +**Isolated Test Environment:** +Generate classes in a separate directory to test new schema versions or extensions without affecting your main MatNWB installation: + +.. code-block:: MATLAB + + % Create isolated test environment + test_dir = '/path/to/test_environment'; + generateCore('2.8.0-dev', 'savedir', test_dir); + + % Test with experimental schema + addpath(test_dir); + nwb = nwbRead('experimental_file.nwb', 'ignorecache'); + +**Parallel MATLAB Sessions:** +When running multiple MATLAB sessions on the same machine for testing or processing, each session can use its own class directory to avoid conflicts: -- Working with multiple schema versions simultaneously +.. code-block:: MATLAB + + nwbClearGenerated(); % Clear previously generated classes + + % Session 1: Generate classes in a temporary directory + session1_dir = '/tmp/matlab_session_1_classes'; + generateCore('savedir', session1_dir); + + % Session 2: Use different temporary directory in parallel session 2 + session2_dir = '/tmp/matlab_session_2_classes'; + generateCore('savedir', session2_dir); + +**Other Use Cases:** - You don't have write permissions to the MatNWB installation directory - You want to keep different projects' classes separate +- Working with different schema versions (though not simultaneously) Understanding Class Files -------------------------- @@ -153,21 +182,34 @@ Generated classes are saved as MATLAB .m files in a ``+types`` package directory These classes define the properties and methods for each NWB data type, enabling the object-oriented interface you use when working with NWB data. -Schema Validation ------------------ - -MatNWB validates that the embedded schemas in a file match the generated classes. If there's a mismatch, you may see warnings or errors suggesting: - -- Regenerating classes for the file's schema version -- Using ``generateCore`` with the correct version -- Checking for schema version conflicts Working with Multiple Schema Versions -------------------------------------- -When working with files from different NWB versions or with different extensions, consider these strategies: +.. important:: + MatNWB currently **cannot work with files of different schema versions simultaneously** in the same MATLAB session. Only one set of schema classes can be active at a time. -**Separate Directories:** +When you need to work with files from different NWB versions or with different extensions, you must work with them sequentially, not simultaneously: + +**Sequential Processing:** + +.. code-block:: MATLAB + + % Process files with schema version 2.6.0 + generateCore('2.6.0'); + nwb_old = nwbRead('old_file_v2_6.nwb'); + % ... work with old file ... + clear nwb_old; + + % Clear classes and switch to version 2.7.0 + nwbClearGenerated(); + generateCore('2.7.0'); + nwb_new = nwbRead('new_file_v2_7.nwb'); + % ... work with new file ... + +**Using Custom Save Directories:** + +For better organization and to avoid conflicts, generate classes in separate directories: .. code-block:: MATLAB @@ -175,21 +217,15 @@ When working with files from different NWB versions or with different extensions generateCore('2.6.0', 'savedir', 'nwb_2_6_0_classes'); generateCore('2.7.0', 'savedir', 'nwb_2_7_0_classes'); - % Add the appropriate directory to your path before reading + % Work with one version at a time addpath('nwb_2_6_0_classes'); nwb_old = nwbRead('old_file.nwb', 'ignorecache'); - -**Project-Specific Classes:** - -.. code-block:: MATLAB - - % Generate classes in your project directory - project_dir = '/path/to/my/project'; - generateCore('savedir', project_dir); - generateExtension('my_extension.yaml', 'savedir', project_dir); + % ... process old files ... + rmpath('nwb_2_6_0_classes'); - % Read files using project-specific classes - nwb = nwbRead('project_file.nwb', 'savedir', project_dir); + % Switch to newer version + addpath('nwb_2_7_0_classes'); + nwb_new = nwbRead('new_file.nwb', 'ignorecache'); Troubleshooting Schema Issues ----------------------------- @@ -200,6 +236,9 @@ If you see errors about incompatible classes or missing properties: .. code-block:: MATLAB + clear all; % Clear workspace to avoid conflicts + nwbClearGenerated(); % Clear previously generated classes + % Check the file's schema version file_version = util.getSchemaVersion('problematic_file.nwb'); @@ -243,6 +282,5 @@ Best Practices 2. **Use 'ignorecache' only when you're sure about schema compatibility** 3. **Keep different schema versions in separate directories** if working with multiple versions 4. **Check schema versions** when troubleshooting read errors -5. **Use custom save directories** for project-specific work Understanding these schema concepts will help you work more confidently with NWB files and troubleshoot issues when they arise. For most users, the automatic schema handling in ``nwbRead`` will be sufficient, but these advanced features provide flexibility for complex workflows. From 3e26eb0351e4b39ea560daad6461be204e4b8886 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 30 Jun 2025 22:24:23 +0200 Subject: [PATCH 05/67] Fix typos --- docs/source/pages/getting_started/file_read.rst | 2 +- docs/source/pages/getting_started/file_read/nwbfile.rst | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/source/pages/getting_started/file_read.rst b/docs/source/pages/getting_started/file_read.rst index cc690a2f1..b0d2ffbc6 100644 --- a/docs/source/pages/getting_started/file_read.rst +++ b/docs/source/pages/getting_started/file_read.rst @@ -1,7 +1,7 @@ Reading NWB Files ================= -This section provides an overview of reading and exploring NWB (Neurodata Without Borders) files with MatNWB. It serves as a reference guide to the data objects you’ll encounter when working with NWB files. For detailed code examples and usage demonstrations, please refer to the :doc:`tutorials <../tutorials/index>`. +This section provides an overview of reading and exploring NWB (Neurodata Without Borders) files with MatNWB. It serves as a reference guide to the functions and data objects you’ll interact with when working with NWB files. For detailed code examples and usage demonstrations, please refer to the :doc:`tutorials <../tutorials/index>`. To read an NWB file, use the :func:`nwbRead` function: diff --git a/docs/source/pages/getting_started/file_read/nwbfile.rst b/docs/source/pages/getting_started/file_read/nwbfile.rst index 239e97af0..671410ec5 100644 --- a/docs/source/pages/getting_started/file_read/nwbfile.rst +++ b/docs/source/pages/getting_started/file_read/nwbfile.rst @@ -75,7 +75,7 @@ For illustration, we'll run the ecephys tutorial and read the resulting NWB file This object contains properties that represent the contents of the NWB file, including metadata about the experiment and data containers for raw and processed data. The object is hierarchical, meaning you can access nested data using dot notation. -For an overview of the NWB file structure, see the `NWB File Structure `_ section of the central +For an overview of the NWB file structure, see the `NWB File Structure `_ section of the `NWB Documentation `_, or for technical details, refer to the `NWB Format Specification `_. One key difference between the :class:`NwbFile` object and the formal NWB structure is that some top-level groups, like ``general``, ``intervals`` and ``stimulus`` are flattened into top level properties of the :class:`NwbFile` object. This is only a convenience for easier access, and does not change the underlying structure of the NWB file. @@ -108,7 +108,7 @@ Display the raw data of the file: SpikeEvents_Shank0: [types.core.SpikeEventSeries] >> -The acquistion property contains a :class:`types.untyped.Set` object, which is a dynamic collection of NWB objects. In this case, it contains two datasets: ``ElectricalSeries`` and ``SpikeEvents_Shank0``. +The acquisition property contains a :class:`types.untyped.Set` object, which is a dynamic collection of NWB objects. In this case, it contains two datasets: ``ElectricalSeries`` and ``SpikeEvents_Shank0``. To access a specific dataset, we can use the :meth:`Set.get` method: @@ -249,14 +249,14 @@ Once you have a data object (whether found through navigation, search, or resolv % Most data objects have a .data property raw_data = electricalseries_obj.data.load(); - size(raw_data) + raw_data_size = size(raw_data) % Check for additional metadata fprintf('Description: %s\n', electricalseries_obj.description); .. code-block:: text - ans = + raw_data_size = 12 3000 From 4d999f80dc98d323677c2856cf9ac141d762f057 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Wed, 3 Sep 2025 14:42:12 +0200 Subject: [PATCH 06/67] Update conf.py - Fill in current year for copyright - Detect version from Contents.m file - Change settings for navigation buttons --- docs/source/conf.py | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/docs/source/conf.py b/docs/source/conf.py index 2251383b8..1a246bd3c 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -8,6 +8,8 @@ import os import sys +import re +from datetime import datetime sys.path.append('sphinx_extensions') from docstring_processors import process_matlab_docstring @@ -24,10 +26,30 @@ def setup(app): app.add_role('matclass', MatClassRole()) project = 'MatNWB' -copyright = '2024, Neurodata Without Borders' # Todo: compute year +copyright = f'{datetime.now().year}, Neurodata Without Borders' author = 'Neurodata Without Borders' -release = '2.7.0' # Todo: read from Contents.m +# Read version from Contents.m +def get_version_from_contents(): + """Extract version number from Contents.m file.""" + script_dir = os.path.dirname(os.path.abspath(__file__)) + contents_path = os.path.abspath(os.path.join(script_dir, '..', '..', 'Contents.m')) + + try: + with open(contents_path, 'r', encoding='utf-8') as f: + for line in f: + # Look for line with "% Version X.Y.Z" + match = re.search(r'%\s*Version\s+(\d+\.\d+\.\d+)', line) + if match: + return match.group(1) + except FileNotFoundError: + print(f"Warning: Contents.m not found at {contents_path}") + return 'unknown' # fallback when file is missing + + print("Warning: Version not found in Contents.m") + return 'unknown' # fallback when version cannot be parsed + +release = get_version_from_contents() # -- General configuration --------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration @@ -82,7 +104,9 @@ def linkcode_resolve(domain, info): html_favicon = os.path.join(matlab_src_dir, 'logo', 'logo_favicon_32.png') html_theme_options = { - "style_nav_header_background": "#000000" + "style_nav_header_background": "#000000", + "collapse_navigation": False, + "sticky_navigation": True } html_context = { From 3e1d8439136e2acc08402ebf4e3d42088a46507e Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 4 Sep 2025 15:08:17 +0200 Subject: [PATCH 07/67] Reorganize docs based on diataxis framework --- docs/README.md | 63 +++ docs/source/_links.rst | 5 + docs/source/conf.py | 3 +- docs/source/index.rst | 75 ++- .../considerations.rst} | 4 +- docs/source/pages/concepts/file_create.rst | 61 +++ .../file_create/data_organization.rst | 289 +++++++++++ .../file_create/hdf5_considerations.rst | 234 +++++++++ .../pages/concepts/file_create/nwbfile.rst | 147 ++++++ .../file_create/performance_optimization.rst | 399 +++++++++++++++ .../concepts/file_create/troubleshooting.rst | 475 ++++++++++++++++++ .../file_read.rst | 0 .../file_read/dynamictable.rst | 0 .../file_read/nwbfile.rst | 0 .../file_read/schemas_and_generation.rst | 37 +- .../file_read/troubleshooting.rst | 4 +- .../file_read/untyped.rst | 0 .../using_extensions.rst} | 10 +- .../{overview_citing.rst => how_to_cite.rst} | 0 .../pages/getting_started/installation.rst | 161 ++++++ .../getting_started/installation_users.rst | 25 - .../source/pages/getting_started/overview.rst | 114 +++++ .../pages/getting_started/quickstart.rst | 96 ++++ docs/source/pages/how_to/index.rst | 7 + .../generating_extension_api.rst | 0 .../installing_extensions.rst | 0 docs/source/pages/tutorials/basicUsage.rst | 2 + docs/source/pages/tutorials/behavior.rst | 2 + docs/source/pages/tutorials/convertTrials.rst | 2 + docs/source/pages/tutorials/dataPipe.rst | 2 + .../tutorials/dimensionMapNoDataPipes.rst | 2 + .../tutorials/dimensionMapWithDataPipes.rst | 2 + .../source/pages/tutorials/dynamic_tables.rst | 2 + .../tutorials/dynamically_loaded_filters.rst | 2 + docs/source/pages/tutorials/ecephys.rst | 2 + docs/source/pages/tutorials/icephys.rst | 2 + docs/source/pages/tutorials/images.rst | 2 + docs/source/pages/tutorials/index.rst | 9 +- docs/source/pages/tutorials/intro.rst | 2 + docs/source/pages/tutorials/ogen.rst | 2 + docs/source/pages/tutorials/ophys.rst | 2 + docs/source/pages/tutorials/read_demo.rst | 2 + .../pages/tutorials/read_demo_dandihub.rst | 2 + docs/source/pages/tutorials/remote_read.rst | 2 + docs/source/pages/tutorials/scratch.rst | 2 + .../_rst_templates/tutorial.rst.template | 2 + 46 files changed, 2163 insertions(+), 93 deletions(-) create mode 100644 docs/README.md create mode 100644 docs/source/_links.rst rename docs/source/pages/{getting_started/important.rst => concepts/considerations.rst} (97%) create mode 100644 docs/source/pages/concepts/file_create.rst create mode 100644 docs/source/pages/concepts/file_create/data_organization.rst create mode 100644 docs/source/pages/concepts/file_create/hdf5_considerations.rst create mode 100644 docs/source/pages/concepts/file_create/nwbfile.rst create mode 100644 docs/source/pages/concepts/file_create/performance_optimization.rst create mode 100644 docs/source/pages/concepts/file_create/troubleshooting.rst rename docs/source/pages/{getting_started => concepts}/file_read.rst (100%) rename docs/source/pages/{getting_started => concepts}/file_read/dynamictable.rst (100%) rename docs/source/pages/{getting_started => concepts}/file_read/nwbfile.rst (100%) rename docs/source/pages/{getting_started => concepts}/file_read/schemas_and_generation.rst (89%) rename docs/source/pages/{getting_started => concepts}/file_read/troubleshooting.rst (98%) rename docs/source/pages/{getting_started => concepts}/file_read/untyped.rst (100%) rename docs/source/pages/{getting_started/using_extenstions.rst => concepts/using_extensions.rst} (74%) rename docs/source/pages/getting_started/{overview_citing.rst => how_to_cite.rst} (100%) create mode 100644 docs/source/pages/getting_started/installation.rst delete mode 100644 docs/source/pages/getting_started/installation_users.rst create mode 100644 docs/source/pages/getting_started/overview.rst create mode 100644 docs/source/pages/getting_started/quickstart.rst create mode 100644 docs/source/pages/how_to/index.rst rename docs/source/pages/{getting_started => how_to}/using_extensions/generating_extension_api.rst (100%) rename docs/source/pages/{getting_started => how_to}/using_extensions/installing_extensions.rst (100%) diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 000000000..597ff4f33 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,63 @@ +# MatNWB Documentation + +This directory contains the documentation for MatNWB, built using Sphinx. + +## Building the Documentation Locally + +### Prerequisites + +1. **Install Python dependencies:** + ```bash + cd docs + pip install -r requirements.txt + ``` + + This installs the required packages: + - sphinx + - sphinx-rtd-theme + - sphinx-copybutton + - sphinxcontrib-matlabdomain + +### Build the Documentation + +**On macOS/Linux:** +```bash +cd docs +make html +``` + +**On Windows:** +```bash +cd docs +make.bat html +``` + +### View the Documentation + +After building, open `docs/build/html/index.html` in your web browser to view the generated documentation. + +### Other Build Options + +- `make clean` - Remove build files +- `make help` - See all available build targets +- `make linkcheck` - Check for broken links + +## Documentation Structure + +- `source/` - Source files for the documentation + - `pages/` - Main documentation pages + - `conf.py` - Sphinx configuration +- `build/` - Generated documentation (created after building) +- `requirements.txt` - Python dependencies for building docs +- `Makefile` - Build commands for Unix systems +- `make.bat` - Build commands for Windows + +## Contributing to Documentation + +When editing documentation: + +1. Make changes to files in the `source/` directory +2. Build locally to test your changes +3. Ensure the documentation builds without warnings + +The documentation uses reStructuredText (`.rst`) format. See the [Sphinx documentation](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html) for syntax reference. diff --git a/docs/source/_links.rst b/docs/source/_links.rst new file mode 100644 index 000000000..484610b12 --- /dev/null +++ b/docs/source/_links.rst @@ -0,0 +1,5 @@ +.. _MatNWB: https://www.python.org/ +.. _PyNWB: https://numpy.org/ +.. _NWB: https://nwb.org + +.. |NWB| replace:: Neurodata Without Borders diff --git a/docs/source/conf.py b/docs/source/conf.py index 1a246bd3c..6ff9a1df9 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -104,7 +104,8 @@ def linkcode_resolve(domain, info): html_favicon = os.path.join(matlab_src_dir, 'logo', 'logo_favicon_32.png') html_theme_options = { - "style_nav_header_background": "#000000", + "style_nav_header_background": "#000000", + "navigation_depth": 2, "collapse_navigation": False, "sticky_navigation": True } diff --git a/docs/source/index.rst b/docs/source/index.rst index 3458b3aa5..683858ae6 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -1,32 +1,75 @@ +.. include:: _links.rst + ############## NWB for MATLAB ############## -MatNWB is a MATLAB package for working with NWB files. It provides a high-level -API for efficiently working with neurodata stored in the NWB format. If you are -new to NWB and would like to learn more, then please also visit the -:nwb_overview:`NWB Overview <>` website, which provides an entry point for -researchers and developers interested in using NWB. +MatNWB_ is a MATLAB package for working with |NWB|_ (NWB) files. +It provides a high‑level, efficient interface for reading and writing neurophysiology data in the NWB format and includes tutorial Live Scripts that guide you through converting and organizing your own data. + +This documentation focuses on MatNWB. If you are new to NWB or want to learn more about the format itself, these resources are a great starting point: + +.. + - :nwb_overview:`NWB Overview` | Placeholder + +- `NWB Overview Introduction `_: Entry point providing a high-level and general overview of the NWB format + +- `NWB Format Specification `_: Detailed overview of the NWB Format and the neurodata type specifications that make up the format. +For a quick introduction to MatNWB, go to the :ref:`Overview ` +page. If you immediately want to see how to read or write files, take a look at the +:ref:`Quickstart ` tutorial. -******** -Contents -******** +For more in-depth examples of how to create NWB files, we recommend you to start +with the :ref:`Introduction` tutorial and then move on to one or +more of the domain-focused tutorials: + +- :ref:`behavior-tutorial` +- :ref:`ecephys-tutorial` +- :ref:`icephys-tutorial` +- :ref:`images-tutorial` +- :ref:`ogen-tutorial` +- :ref:`ophys-tutorial` + +To explore the growing world of open-source neuroscience data stored in the +NWB format, check out the :ref:`Read from Dandihub` how-to-guide. + +This documentation is based on the `diataxis `_ framework. +When you browse the table of contents below, look for tutorials, how-to-guides, +concepts (explanation) and reference sections to help orient yourself. .. toctree:: - :maxdepth: 2 - :caption: Getting Started + :maxdepth: 1 + :caption: Get Started - pages/getting_started/installation_users - pages/getting_started/important - pages/getting_started/file_read - pages/getting_started/using_extenstions.rst + pages/getting_started/overview + pages/getting_started/installation + pages/getting_started/quickstart + +.. toctree:: + :maxdepth: 2 + :caption: Tutorials + pages/tutorials/index - pages/getting_started/overview_citing .. toctree:: :maxdepth: 2 - :caption: MatNWB Documentation + :caption: How-tos + + pages/how_to/index + +.. toctree:: + :maxdepth: 2 + :caption: Concepts + + pages/concepts/considerations + pages/concepts/file_read + pages/concepts/file_create + pages/concepts/using_extensions + +.. toctree:: + :maxdepth: 1 + :caption: MatNWB Reference pages/functions/index pages/neurodata_types/core/index diff --git a/docs/source/pages/getting_started/important.rst b/docs/source/pages/concepts/considerations.rst similarity index 97% rename from docs/source/pages/getting_started/important.rst rename to docs/source/pages/concepts/considerations.rst index b458c5f8a..efead93d0 100644 --- a/docs/source/pages/getting_started/important.rst +++ b/docs/source/pages/concepts/considerations.rst @@ -1,5 +1,5 @@ -Important -========= +Important considerations (MatNWB) +================================= When using MatNWB, it is important to understand the differences in how array dimensions are ordered in MATLAB versus HDF5. While the NWB documentation and diff --git a/docs/source/pages/concepts/file_create.rst b/docs/source/pages/concepts/file_create.rst new file mode 100644 index 000000000..f99830289 --- /dev/null +++ b/docs/source/pages/concepts/file_create.rst @@ -0,0 +1,61 @@ +Creating NWB Files +================== + +This section provides a guide to creating NWB (Neurodata Without Borders) files with MatNWB. It covers the fundamental concepts, step-by-step workflow, and important considerations when building NWB files from scratch. For detailed code examples and usage demonstrations, please refer to the :doc:`tutorials <../tutorials/index>`. + +Creating an NWB file involves three main steps: + +1. **Create an NwbFile object** with required metadata +2. **Add neurodata types** (time series, processed data, etc.) +3. **Export the file** using the :func:`nwbExport` function + +**Example:** + +.. code-block:: MATLAB + + % Step 1: Create NwbFile object + nwb = NwbFile( ... + 'session_start_time', datetime('now', 'TimeZone', 'local'), ... + 'identifier', 'unique_session_id', ... + 'session_description', 'Description of your experiment'); + + % Step 2: Add data (example: time series data) + data = randn(1000, 10); % Example neural data + timeseries = types.core.TimeSeries( ... + 'data', data, ... + 'data_unit', 'volts', ... + 'starting_time', 0.0, ... + 'starting_time_rate', 30000.0); + nwb.acquisition.set('neural_data', timeseries); + + % Step 3: Export to file + nwbExport(nwb, 'my_experiment.nwb'); + +.. note:: + After export the file, it is recommended to use the NWBInspector for comprehensive validation of both structural compliance with the NWB schema and compliance of data with NWB best practices. See :func:`inspectNwbFile`. + +When creating an NWB file, it is useful to understand both its structure and the underlying HDF5 format. The :ref:`next section` covers the NwbFile object and its configuration; later sections address data organization, performance, and important caveats about the HDF5 format. + +.. warning:: + **Important HDF5 Limitations** + + NWB files are stored in HDF5 format, which has important limitations: + + - **To modify datasets** after creation - a DataPipe must be configured for the dataset on creation. + - **Datasets should not be deleted** once created - the space will not be reclaimed. + - **Schema consistency** must be maintained throughout the file creation process. + + See :doc:`file_create/hdf5_considerations` for detailed information on working within these constraints. + +**Next steps** + +The following pages provide detailed information on specific aspects of creating NWB files: + +.. toctree:: + :maxdepth: 1 + + file_create/nwbfile + file_create/data_organization + file_create/hdf5_considerations + file_create/performance_optimization + file_create/troubleshooting diff --git a/docs/source/pages/concepts/file_create/data_organization.rst b/docs/source/pages/concepts/file_create/data_organization.rst new file mode 100644 index 000000000..dfdb852ee --- /dev/null +++ b/docs/source/pages/concepts/file_create/data_organization.rst @@ -0,0 +1,289 @@ +Data Organization in NWB Files +============================== + +Once you have created an :class:`NwbFile` object, the next step is adding your experimental data using appropriate NWB data types. The NWB format provides a standardized structure for different types of neuroscience data. + +Data Organization Hierarchy +--------------------------- + +NWB files organize data into several main categories: + +- **acquisition** - Raw, unprocessed data from the experiment +- **processing** - Processed/analyzed data, organized by processing modules +- **stimulus** - Information about experimental stimuli +- **analysis** - Custom analysis results +- **scratch** - Temporary storage during analysis + +.. code-block:: MATLAB + + % Example of the basic structure + nwb.acquisition.set('RawEphys', electrical_series); + nwb.processing.set('EphysModule', processing_module); + nwb.stimulus_presentation.set('VisualStimulus', image_series); + +Adding Data with the .set Method +--------------------------------- + +NWB data containers (like ``acquisition``, ``processing``, etc.) use the ``.set`` method to add data objects. This method requires two arguments: + +1. **Name** (string) - A unique identifier for the data object within that container +2. **Data Object** - The NWB data type being added (e.g., TimeSeries, ProcessingModule) + +.. code-block:: MATLAB + + % The .set method syntax: + nwb.acquisition.set('DataName', data_object); + + % Why .set is used instead of direct assignment: + % This allows NWB to maintain internal structure and validate data types + +**Naming Conventions:** + +Use valid MATLAB identifiers with PascalCase for consistency: + +.. code-block:: MATLAB + + % Good naming examples (PascalCase, descriptive): + nwb.acquisition.set('RawElectricalSeries', electrical_series); + nwb.acquisition.set('CalciumImagingData', two_photon_series); + nwb.acquisition.set('BehaviorVideo', image_series); + + % Avoid these naming patterns: + nwb.acquisition.set('data1', electrical_series); % Not descriptive + nwb.acquisition.set('raw-ephys', electrical_series); % Invalid MATLAB identifier + nwb.acquisition.set('raw_ephys_data', electrical_series); % Use PascalCase instead + +- **Use PascalCase** - capitalize the first letter of each word +- **Be descriptive** - names should indicate the data content and type +- **Avoid special characters** - stick to letters, numbers, and underscores if needed +- **Use valid MATLAB identifiers** - names that could be valid variable names +- **Be consistent** - establish and follow naming patterns within your lab/project + +Refer to the :nwbinspector:`Naming Conventions ` section of the NWB Inspector docs for more details. + + +Time Series Data +---------------- + +Most neural data is time-varying and should use :class:`TimeSeries` objects or their specialized subclasses: + +**Basic TimeSeries:** + +.. code-block:: MATLAB + + % Generic time series data + data = randn(5, 1000); % 5 channels, 1000 time points + + ts = types.core.TimeSeries( ... + 'data', data, ... + 'data_unit', 'arbitrary_units', ... + 'starting_time', 0.0, ... + 'starting_time_rate', 1000.0, ... % 1kHz sampling rate + 'description', 'Raw neural signal'); + + nwb.acquisition.set('RawSignal', ts); + +**Electrophysiology Data:** + +For extracellular recordings, use :class:`ElectricalSeries`: + +.. code-block:: MATLAB + + % Create electrode table (describes recording channels) + electrode_table = util.createElectrodeTable(nwb, electrode_info); + + % Create reference to specific electrodes + electrode_region = types.hdmf_common.DynamicTableRegion( ... + 'table', types.untyped.ObjectView(electrode_table), ... + 'description', 'recording electrodes', ... + 'data', [0, 1, 2, 3]); % Which electrodes were used + + % Raw extracellular data + raw_data = int16(randn(30000, 4) * 1000); % 1 second at 30kHz, 4 channels + + electrical_series = types.core.ElectricalSeries( ... + 'data', raw_data, ... + 'data_unit', 'microvolts', ... + 'electrodes', electrode_region, ... + 'starting_time', 0.0, ... + 'starting_time_rate', 30000.0); + + nwb.acquisition.set('RawEphys', electrical_series); + +**Calcium Imaging Data:** + +For optical data, use :class:`TwoPhotonSeries` or :class:`OnePhotonSeries`: + +.. code-block:: MATLAB + + % First define imaging plane + imaging_plane = types.core.ImagingPlane( ... + 'description', 'Primary visual cortex, layer 2/3', ... + 'excitation_lambda', 925.0, ... % Two-photon excitation wavelength + 'imaging_rate', 30.0, ... + 'indicator', 'GCaMP6f', ... + 'location', 'V1'); + + nwb.general_optophysiology.set('ImagingPlane1', imaging_plane); + + % Calcium imaging time series + imaging_data = uint16(randn(50, 50, 1000) * 1000 + 2000); % 50x50 pixels, 1000 frames + + two_photon_series = types.core.TwoPhotonSeries( ... + 'data', imaging_data, ... + 'imaging_plane', types.untyped.SoftLink(imaging_plane), ... + 'starting_time', 0.0, ... + 'starting_time_rate', 30.0, ... + 'data_unit', 'fluorescence'); + + nwb.acquisition.set('CalciumImaging', two_photon_series); + +Processing Modules +------------------ + +Processed data should be organized into processing modules, which group related analyses together: + +.. code-block:: MATLAB + + % Create a processing module for extracellular ephys + ephys_module = types.core.ProcessingModule( ... + 'description', 'Processed extracellular electrophysiology data'); + + % Add LFP data to the module + lfp_data = randn(1000, 4); % Downsampled/filtered data + + lfp_electrical_series = types.core.ElectricalSeries( ... + 'data', lfp_data, ... + 'data_unit', 'microvolts', ... + 'electrodes', electrode_region, ... + 'starting_time', 0.0, ... + 'starting_time_rate', 1000.0); % 1kHz for LFP + + lfp = types.core.LFP(); + lfp.electricalseries.set('LFP', lfp_electrical_series); + + ephys_module.nwbdatainterface.set('LFP', lfp); + nwb.processing.set('Ecephys', ephys_module); + +Spike Data and Units +-------------------- + +Spike times and sorted units use the specialized :class:`Units` table: + +.. code-block:: MATLAB + + % Create a Units table for spike data + units_table = types.core.Units( ... + 'colnames', {'spike_times'}, ... + 'description', 'Sorted single units'); + + % Add spike times for each unit + unit1_spikes = [0.1, 0.5, 1.2, 1.8, 2.3]; % Spike times in seconds + unit2_spikes = [0.3, 0.9, 1.5, 2.1, 2.7]; + + units_table.addRow('spike_times', unit1_spikes); + units_table.addRow('spike_times', unit2_spikes); + + nwb.units = units_table; + +Behavioral Data +--------------- + +Behavioral measurements can be stored as :class:`TimeSeries` or in specialized containers: + +.. code-block:: MATLAB + + % Position tracking + position_data = randn(1000, 2); % X, Y coordinates over time + + spatial_series = types.core.SpatialSeries( ... + 'data', position_data, ... + 'reference_frame', 'Arena coordinates (cm)', ... + 'data_unit', 'cm', ... + 'starting_time', 0.0, ... + 'starting_time_rate', 60.0); % 60 Hz tracking + + position = types.core.Position(); + position.spatialseries.set('Position', spatial_series); + + % Add to a behavior processing module + behavior_module = types.core.ProcessingModule( ... + 'description', 'Behavioral data processing'); + behavior_module.nwbdatainterface.set('Position', position); + nwb.processing.set('Behavior', behavior_module); + +Trial Structure +--------------- + +Experimental trials are stored in the intervals table: + +.. code-block:: MATLAB + + % Create trials table + trials = types.core.TimeIntervals( ... + 'colnames', {'start_time', 'stop_time', 'stimulus_type', 'response'}, ... + 'description', 'Experimental trials'); + + % Add individual trials + trials.addRow( ... + 'start_time', 0.0, ... + 'stop_time', 2.0, ... + 'stimulus_type', 'left_grating', ... + 'response', 'correct'); + + trials.addRow( ... + 'start_time', 5.0, ... + 'stop_time', 7.0, ... + 'stimulus_type', 'right_grating', ... + 'response', 'incorrect'); + + nwb.intervals_trials = trials; + +Large Dataset Considerations +---------------------------- + +For large datasets, consider using :class:`types.untyped.DataPipe` for compression and chunking: + +.. code-block:: MATLAB + + % Large imaging dataset with compression + large_imaging_data = uint16(randn(512, 512, 10000) * 1000); + + compressed_data = types.untyped.DataPipe( ... + 'data', large_imaging_data, ... + 'compressionLevel', 6, ... + 'chunkSize', [512, 512, 1]); % Chunk by frame + + two_photon_series = types.core.TwoPhotonSeries( ... + 'data', compressed_data, ... + 'imaging_plane', types.untyped.SoftLink(imaging_plane), ... + 'starting_time', 0.0, ... + 'starting_time_rate', 30.0, ... + 'data_unit', 'fluorescence'); + +See :doc:`performance_optimization` for detailed information on handling large datasets efficiently. + + +Validation and Consistency +-------------------------- + +Key principles for data organization: + +1. **Use appropriate data types** - don't store imaging data as generic TimeSeries +2. **Maintain consistent units** - ensure all related data uses the same time base +3. **Document your choices** - use descriptive names and fill in description fields + +.. code-block:: MATLAB + + % Good practice: descriptive names and consistent units + nwb.acquisition.set('RawExtracellularV1', electrical_series); + nwb.acquisition.set('CalciumImagingV1L23', two_photon_series); + + % Bad practice: generic names, unclear relationships + nwb.acquisition.set('Data1', electrical_series); + nwb.acquisition.set('Data2', two_photon_series); + +Next Steps +---------- + +With your data properly organized, the next considerations are performance optimization and understanding HDF5 constraints that affect how you structure your file creation workflow. diff --git a/docs/source/pages/concepts/file_create/hdf5_considerations.rst b/docs/source/pages/concepts/file_create/hdf5_considerations.rst new file mode 100644 index 000000000..b30d65b72 --- /dev/null +++ b/docs/source/pages/concepts/file_create/hdf5_considerations.rst @@ -0,0 +1,234 @@ +.. _hdf5-considerations: + +HDF5 Considerations and Limitations +=================================== + +NWB files are stored in HDF5 format, which provides excellent performance and portability but comes with important limitations that affect how you create and modify files. Understanding these constraints is essential for effective NWB file management. + +.. warning:: + **Critical HDF5 Limitations** + + - Files cannot be easily modified after creation + - Adding new datasets requires specialized approaches + - Concurrent access by multiple processes is not supported + - Schema changes require recreating the entire file + - Large datasets need careful memory management + +File Modification Challenges +---------------------------- + +**The Core Problem:** + +Unlike simple text files, HDF5 files have a complex internal structure that makes modifications difficult: + +.. code-block:: MATLAB + + % This workflow is PROBLEMATIC: + + % Day 1: Create initial file + nwb = create_basic_nwb_file(); + nwbExport(nwb, 'experiment.nwb'); + + % Day 2: Try to add more data (DIFFICULT!) + nwb = nwbRead('experiment.nwb'); + % Adding new acquisition data here is complex and error-prone + new_data = record_more_data(); + % nwb.acquisition.set('day2_data', new_data); % Not straightforward! + % nwbExport(nwb, 'experiment.nwb'); % May corrupt the file + +**Why Modification is Difficult:** + +1. **Fixed internal structure** - HDF5 pre-allocates space for datasets +2. **Metadata dependencies** - Changes can break internal links and references +3. **Compression conflicts** - Compressed data cannot be easily extended +4. **Schema validation** - New data must maintain consistency with existing structure + +Strategies for File Modification +--------------------------------- + +**Strategy 1: Plan for Incremental Data (Recommended)** + +Design your workflow to accommodate all expected data from the start: + +.. code-block:: MATLAB + + % Create file structure for ALL expected data upfront + nwb = NwbFile( ... + 'session_start_time', datetime('now', 'TimeZone', 'local'), ... + 'identifier', 'session_001', ... + 'session_description', 'Multi-day recording session'); + + % Pre-allocate space for time series that will grow + initial_data = zeros(0, 32); % Start with 0 timepoints, 32 channels + max_timepoints = 1000000; % But plan for up to 1M timepoints + + data_pipe = types.untyped.DataPipe( ... + 'data', initial_data, ... + 'maxSize', [max_timepoints, 32], ... % Reserve space + 'axis', 1); % Allow growth along time axis + + electrical_series = types.core.ElectricalSeries( ... + 'data', data_pipe, ... + 'electrodes', electrode_region, ... + 'starting_time', 0.0, ... + 'starting_time_rate', 30000.0); + + nwb.acquisition.set('extracellular', electrical_series); + nwbExport(nwb, 'experiment.nwb'); + + % Later: Append new data incrementally + nwb = nwbRead('experiment.nwb', 'ignorecache'); + new_chunk = record_next_data_chunk(); + nwb.acquisition.get('extracellular').data.append(new_chunk); + +**Strategy 2: Separate Files for Each Session** + +Keep each recording session in its own file: + +.. code-block:: MATLAB + + % Better approach: separate files + for session = 1:num_sessions + nwb = create_session_nwb(session); + filename = sprintf('experiment_session_%03d.nwb', session); + nwbExport(nwb, filename); + end + + % Analysis code reads multiple files as needed + all_sessions = {}; + for session = 1:num_sessions + filename = sprintf('experiment_session_%03d.nwb', session); + all_sessions{session} = nwbRead(filename); + end + +**Strategy 3: Recreate Files When Necessary** + +For significant additions, recreate the entire file: + +.. code-block:: MATLAB + + % Read existing data + old_nwb = nwbRead('experiment_v1.nwb'); + + % Create new file with old + new data + new_nwb = NwbFile( ... + 'session_start_time', old_nwb.session_start_time, ... + 'identifier', old_nwb.identifier, ... + 'session_description', old_nwb.session_description); + + % Copy existing data + copy_data_objects(old_nwb, new_nwb); + + % Add new data + new_nwb.acquisition.set('additional_recording', new_electrical_series); + + % Export new version + nwbExport(new_nwb, 'experiment_v2.nwb'); + +Edit Mode vs. Overwrite Mode +---------------------------- + +MatNWB provides two export modes with different behaviors: + +.. code-block:: MATLAB + + % Overwrite mode (default): Creates new file, replacing any existing file + nwbExport(nwb, 'data.nwb', 'overwrite'); + + % Edit mode: Attempts to modify existing file (LIMITED FUNCTIONALITY) + nwbExport(nwb, 'data.nwb', 'edit'); + +**Edit Mode Limitations:** + +- Can only modify certain metadata fields +- Cannot add new datasets or change data structure +- Cannot resize existing datasets +- Primarily useful for updating file creation timestamps + +.. warning:: + Edit mode is **not** a general solution for file modification. It should only be used for minor metadata updates. + + +Concurrent Access Limitations +----------------------------- + +**Problem: Multiple Processes Cannot Write Simultaneously** + +.. code-block:: MATLAB + + % This will fail if run simultaneously: + + % Process 1: + nwb1 = nwbRead('shared_file.nwb'); + % ... modify nwb1 ... + nwbExport(nwb1, 'shared_file.nwb'); % Will lock file + + % Process 2 (running at same time): + nwb2 = nwbRead('shared_file.nwb'); % May fail or get corrupted data + % ... modify nwb2 ... + nwbExport(nwb2, 'shared_file.nwb'); % Will overwrite Process 1's changes! + +**Solutions for Concurrent Workflows:** + +1. **Use separate files per process:** + +.. code-block:: MATLAB + + % Each process writes to its own file + process_id = get_process_id(); + filename = sprintf('data_process_%d.nwb', process_id); + nwbExport(nwb, filename); + + % Combine files later in post-processing step + +2. **Coordinate access with file locking:** + +.. code-block:: MATLAB + + function safe_nwb_append(filename, new_data) + lock_file = [filename '.lock']; + + % Wait for exclusive access + while exist(lock_file, 'file') + pause(0.1); + end + + % Create lock + fclose(fopen(lock_file, 'w')); + + try + % Perform file operation + nwb = nwbRead(filename); + nwb.acquisition.get('data').data.append(new_data); + % Note: this may still fail due to HDF5 limitations + + finally + % Always release lock + if exist(lock_file, 'file') + delete(lock_file); + end + end + end + +Schema Consistency Requirements +------------------------------- + +**The Problem:** + +HDF5 requires that data structure remains consistent with the schema: + +Scenario: +- Read a previously generated file to make changes with ignorecache +- Current types are of different schema version +- Create new types and add to file + +Working Within HDF5 Constraints +------------------------------- + +**Recommended Workflow:** + +1. **Plan your complete data structure upfront** +2. **Use separate files for truly independent data** +3. **Pre-allocate space for datasets that will grow** + +Understanding these HDF5 limitations will help you design robust workflows that work reliably with NWB files. The next section covers performance optimization strategies that work within these constraints. diff --git a/docs/source/pages/concepts/file_create/nwbfile.rst b/docs/source/pages/concepts/file_create/nwbfile.rst new file mode 100644 index 000000000..65fb86fca --- /dev/null +++ b/docs/source/pages/concepts/file_create/nwbfile.rst @@ -0,0 +1,147 @@ +.. _matnwb-create-nwbfile-intro: + +Creating the NwbFile Object +=========================== + +The :class:`NwbFile` object is the root container for all data in an NWB file. Before adding any experimental data, you must create this object and add the required metadata properties. + +Required properties +------------------- + +The NWB file must contain three required properties that needs to be manually specified: + +1. **session_start_time** (:class:`datetime`) - When the experiment began, with timezone information +2. **identifier** (:class:`char`) - A unique identifier for this specific session/file +3. **session_description** (:class:`char`) - Brief description of the experimental session + +**Example:** + +.. code-block:: MATLAB + + nwb = NwbFile( ... + 'session_start_time', datetime('2024-01-15 09:30:00', 'TimeZone', 'local'), ... + 'identifier', 'Mouse001_Session_20240115', ... + 'session_description', 'Two-photon calcium imaging during whisker stimulation'); + +Two additional required properties are set automatically if not provided: + +- **file_create_date** - Automatically set to the current time when the file is exported +- **timestamps_reference_time** - Defaults to match ``session_start_time`` if not explicitly set + +Recommended Metadata Properties +------------------------------- + +While not required, these properties provide important context for your data: + +- **general_experimenter** - Who conducted the experiment +- **general_institution** - Where the experiment was performed +- **general_lab** - Which laboratory/group +- **general_session_id** - Lab-specific session identifier +- **general_experiment_description** - Detailed experimental context + +**Example:** + +.. code-block:: MATLAB + + nwb = NwbFile( ... + 'session_start_time', datetime('2024-01-15 09:30:00', 'TimeZone', 'local'), ... + 'identifier', 'Mouse001_Session_20240115', ... + 'session_description', 'Two-photon calcium imaging during whisker stimulation', ... + 'general_experimenter', 'Dr. Jane Smith', ... + 'general_institution', 'University Research Institute', ... + 'general_lab', 'Neural Circuits Lab', ... + 'general_session_id', 'session_001', ... + 'general_experiment_description', 'Investigation of sensory processing in barrel cortex'); + + +Subject Information +------------------- + +Information about the experimental subject should be added using the :class:`types.core.Subject` class: + +.. code-block:: MATLAB + + % Create subject information + subject = types.core.Subject( ... + 'subject_id', 'Mouse001', ... + 'age', 'P90', ... % Post-natal day 90 + 'description', 'C57BL/6J mouse', ... + 'species', 'Mus musculus', ... + 'sex', 'M'); + + % Add to NWB file + nwb.general_subject = subject; + +Best Practices for Identifiers +------------------------------ + +**Session Identifiers:** + +Choose identifiers that are: + +- **Unique across your entire dataset** - avoid conflicts between labs, experiments, etc. +- **Informative** - include subject, date, session number when helpful +- **Consistent** - use a standardized naming scheme + +.. code-block:: MATLAB + + % Good examples: + identifier = 'SmithLab_Mouse001_20240115_Session01'; + identifier = 'MD5HASH_a1b2c3d4e5f6'; % For anonymization + identifier = sprintf('%s_%s_%s', lab_id, subject_id, datestr(now, 'yyyymmdd')); + +**Session Descriptions:** + +Be specific and include: + +- **Experimental paradigm** - what task or stimulation was used +- **Recording method** - electrophysiology, imaging, behavior only, etc. +- **Key experimental variables** - drug conditions, genotypes, etc. + +.. code-block:: MATLAB + + % Good examples: + session_description = 'Extracellular recordings in primary visual cortex during oriented grating presentation'; + session_description = 'Two-photon calcium imaging of layer 2/3 pyramidal neurons during whisker deflection'; + session_description = 'Behavioral training on auditory discrimination task, no neural recordings'; + +Time Zone Considerations +------------------------ + +NWB files store all timestamps in a standardized format. Always specify the timezone when creating datetime objects: + +.. code-block:: MATLAB + + % Specify local timezone + session_start = datetime('2024-01-15 09:30:00', 'TimeZone', 'America/New_York'); + + % Or use UTC if preferred + session_start = datetime('2024-01-15 14:30:00', 'TimeZone', 'UTC'); + + % Current time with local timezone + session_start = datetime('now', 'TimeZone', 'local'); + +The ``timestamps_reference_time`` field defines "time zero" for all timestamps in the file. This is typically set to match ``session_start_time``, but can be different if needed for your experimental design. + +Validation +---------- + +The NwbFile and (included datatypes) will be validated when you attempt to export to file using the :func:`nwbExport` function. If any required properties are missing, an error will be raised. + +.. code-block:: MATLAB + + % This will fail - missing required properties + nwb = NwbFile(); + nwbExport(nwb, 'test.nwb'); % Error: missing identifier, session_description, etc. + + % This will succeed + nwb = NwbFile( ... + 'session_start_time', datetime('now', 'TimeZone', 'local'), ... + 'identifier', 'test_session', ... + 'session_description', 'Test file'); + nwbExport(nwb, 'test.nwb'); % Success + +Next Steps +---------- + +Once you have created an NwbFile object, you can begin adding experimental data. The next section covers how to organize different types of data within the NWB structure. diff --git a/docs/source/pages/concepts/file_create/performance_optimization.rst b/docs/source/pages/concepts/file_create/performance_optimization.rst new file mode 100644 index 000000000..56e940954 --- /dev/null +++ b/docs/source/pages/concepts/file_create/performance_optimization.rst @@ -0,0 +1,399 @@ +Performance Optimization +======================== + +Creating efficient NWB files requires careful consideration of data layout, compression, and memory usage. This section provides strategies for optimizing performance when working with large datasets. + +Understanding DataPipe +----------------------- + +The :class:`types.untyped.DataPipe` class is the key to efficient data handling in MatNWB. It provides: + +- **Lazy loading** - Data isn't loaded into memory until needed +- **Compression** - Reduces file size significantly +- **Chunking** - Optimizes access patterns +- **Iterative writing** - Enables processing datasets larger than RAM + +Basic DataPipe Usage +~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: MATLAB + + % Simple compression + raw_data = randn(10000, 64); % 10k samples, 64 channels + + compressed_data = types.untyped.DataPipe( ... + 'data', raw_data, ... + 'compressionLevel', 6); % Moderate compression + + electrical_series = types.core.ElectricalSeries( ... + 'data', compressed_data, ... + 'electrodes', electrode_region, ... + 'starting_time', 0.0, ... + 'starting_time_rate', 30000.0); + +Compression Strategies +---------------------- + +**Choosing Compression Levels:** + +.. code-block:: MATLAB + + % Compression level 0: No compression (fastest, largest files) + no_compression = types.untyped.DataPipe('data', data, 'compressionLevel', 0); + + % Compression level 3-6: Good balance (recommended for most cases) + balanced = types.untyped.DataPipe('data', data, 'compressionLevel', 4); + + % Compression level 9: Maximum compression (slowest, smallest files) + max_compression = types.untyped.DataPipe('data', data, 'compressionLevel', 9); + +**Performance Comparison:** + +.. code-block:: MATLAB + + % Benchmark different compression levels + test_data = uint16(randn(1000, 1000) * 1000 + 2000); % Typical imaging data + + for comp_level = [0, 3, 6, 9] + tic; + data_pipe = types.untyped.DataPipe( ... + 'data', test_data, ... + 'compressionLevel', comp_level); + + nwb = create_test_nwb(); + nwb.acquisition.set('test_data', create_timeseries(data_pipe)); + filename = sprintf('test_compression_%d.nwb', comp_level); + nwbExport(nwb, filename); + + file_info = dir(filename); + time_taken = toc; + + fprintf('Compression %d: %.2f seconds, %.2f MB\n', ... + comp_level, time_taken, file_info.bytes / 1e6); + delete(filename); + end + +Optimal Chunking +---------------- + +Chunking determines how data is stored internally and dramatically affects access performance: + +**Time-Series Chunking:** + +.. code-block:: MATLAB + + data = randn(100000, 32); % 100k timepoints, 32 channels + + % For temporal analysis (accessing time ranges): + temporal_chunks = types.untyped.DataPipe( ... + 'data', data, ... + 'chunkSize', [1000, 32]); % 1k timepoints, all channels + + % For channel analysis (accessing individual channels): + channel_chunks = types.untyped.DataPipe( ... + 'data', data, ... + 'chunkSize', [100000, 1]); % All timepoints, single channel + + % For block analysis (accessing small time-channel blocks): + block_chunks = types.untyped.DataPipe( ... + 'data', data, ... + 'chunkSize', [1000, 8]); % 1k timepoints, 8 channels + +**Imaging Data Chunking:** + +.. code-block:: MATLAB + + imaging_data = uint16(randn(512, 512, 1000) * 1000); % 512x512 pixels, 1000 frames + + % For frame-by-frame access: + frame_chunks = types.untyped.DataPipe( ... + 'data', imaging_data, ... + 'chunkSize', [512, 512, 1]); % One complete frame per chunk + + % For pixel time-series analysis: + pixel_chunks = types.untyped.DataPipe( ... + 'data', imaging_data, ... + 'chunkSize', [1, 1, 1000]); % All timepoints for single pixel + + % For ROI-based access: + roi_chunks = types.untyped.DataPipe( ... + 'data', imaging_data, ... + 'chunkSize', [64, 64, 100]); % 64x64 spatial blocks, 100 frames + +Automatic Chunk Size Selection +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Let DataPipe choose optimal chunk sizes when you're unsure: + +.. code-block:: MATLAB + + % DataPipe will automatically choose reasonable chunk size + auto_chunked = types.untyped.DataPipe( ... + 'data', data, ... + 'compressionLevel', 6); % Only specify compression + + % You can still provide hints about the primary access dimension + time_optimized = types.untyped.DataPipe( ... + 'data', data, ... + 'axis', 1); % Hint: will primarily access along first dimension (time) + +Memory-Efficient Large Dataset Handling +--------------------------------------- + +**Iterative Writing Workflow:** + +For datasets larger than available RAM: + +.. code-block:: MATLAB + + function create_large_nwb_file(total_duration_sec, sampling_rate, num_channels) + % Calculate dimensions + total_samples = total_duration_sec * sampling_rate; + chunk_duration = 60; % Process 1 minute at a time + chunk_samples = chunk_duration * sampling_rate; + + % Create initial chunk + first_chunk = load_data_chunk(1, chunk_samples, num_channels); + + % Create DataPipe with reserved space + data_pipe = types.untyped.DataPipe( ... + 'data', first_chunk, ... + 'maxSize', [total_samples, num_channels], ... + 'chunkSize', [chunk_samples, num_channels], ... + 'compressionLevel', 6, ... + 'axis', 1); + + % Create NWB file + nwb = create_base_nwb(); + electrical_series = types.core.ElectricalSeries( ... + 'data', data_pipe, ... + 'electrodes', electrode_region, ... + 'starting_time', 0.0, ... + 'starting_time_rate', sampling_rate); + + nwb.acquisition.set('continuous_ephys', electrical_series); + nwbExport(nwb, 'large_dataset.nwb'); + + % Append remaining chunks + nwb = nwbRead('large_dataset.nwb', 'ignorecache'); + num_chunks = ceil(total_samples / chunk_samples); + + for chunk_idx = 2:num_chunks + fprintf('Processing chunk %d of %d\n', chunk_idx, num_chunks); + + % Load next chunk from your data source + chunk_data = load_data_chunk(chunk_idx, chunk_samples, num_channels); + + % Append to file + nwb.acquisition.get('continuous_ephys').data.append(chunk_data); + end + + fprintf('Large dataset creation complete!\n'); + end + +**Streaming from Acquisition Systems:** + +.. code-block:: MATLAB + + function stream_acquisition_to_nwb(acquisition_system, output_file) + % Initialize with small buffer + buffer_size = 30000; % 1 second at 30kHz + initial_data = zeros(buffer_size, 32); + + data_pipe = types.untyped.DataPipe( ... + 'data', initial_data, ... + 'maxSize', [Inf, 32], ... % Unknown final size + 'chunkSize', [buffer_size, 32]); + + % Create and export initial NWB structure + nwb = create_acquisition_nwb(); + nwb.acquisition.set('live_recording', ... + create_electrical_series(data_pipe)); + nwbExport(nwb, output_file); + + % Stream data as it arrives + nwb = nwbRead(output_file, 'ignorecache'); + + while acquisition_system.is_recording() + new_data = acquisition_system.get_next_buffer(); + nwb.acquisition.get('live_recording').data.append(new_data); + end + end + +Optimizing Data Types +--------------------- + +**Choose Appropriate Numeric Types:** + +.. code-block:: MATLAB + + % Raw electrophysiology: often int16 is sufficient + raw_ephys = int16(randn(10000, 32) * 1000); % ±32,767 range + + % Calcium imaging: uint16 typical for camera data + calcium_data = uint16(randn(512, 512, 1000) * 1000 + 2000); + + % Processed data: may need double precision + processed_signals = double(compute_filtered_signals(raw_ephys)); + + % Behavioral measurements: single precision often sufficient + position_data = single(randn(10000, 2)); + +**Memory Usage Comparison:** + +.. code-block:: MATLAB + + % Compare memory usage of different data types + n_samples = 1000000; + + double_data = randn(n_samples, 1); % 8 bytes per sample + single_data = single(randn(n_samples, 1)); % 4 bytes per sample + int16_data = int16(randn(n_samples, 1)*1000); % 2 bytes per sample + + fprintf('Double: %.1f MB\n', whos('double_data').bytes / 1e6); + fprintf('Single: %.1f MB\n', whos('single_data').bytes / 1e6); + fprintf('Int16: %.1f MB\n', whos('int16_data').bytes / 1e6); + +Parallel Processing Considerations +---------------------------------- + +**File-Level Parallelization:** + +Process different experimental sessions in parallel: + +.. code-block:: MATLAB + + session_files = {'session1.mat', 'session2.mat', 'session3.mat'}; + + parfor i = 1:length(session_files) + % Each worker creates its own NWB file + session_data = load(session_files{i}); + nwb = convert_session_to_nwb(session_data); + + output_file = sprintf('session_%03d.nwb', i); + nwbExport(nwb, output_file); + end + +**Data-Level Parallelization:** + +Process large datasets in parallel chunks: + +.. code-block:: MATLAB + + function process_large_dataset_parallel(input_file, output_file) + % Load metadata to determine processing strategy + data_info = get_dataset_info(input_file); + num_chunks = ceil(data_info.total_samples / data_info.chunk_size); + + % Process chunks in parallel + processed_chunks = cell(num_chunks, 1); + + parfor chunk_idx = 1:num_chunks + raw_chunk = load_data_chunk(input_file, chunk_idx); + processed_chunks{chunk_idx} = process_chunk(raw_chunk); + end + + % Combine results sequentially (HDF5 doesn't support parallel writing) + combine_chunks_to_nwb(processed_chunks, output_file); + end + +Performance Monitoring +---------------------- + +**Benchmark Your Workflow:** + +.. code-block:: MATLAB + + function benchmark_nwb_creation(data_sizes, chunk_sizes, compression_levels) + results = table(); + + for data_size = data_sizes + for chunk_size = chunk_sizes + for comp_level = compression_levels + % Generate test data + test_data = randn(data_size, 32); + + % Time the creation process + tic; + data_pipe = types.untyped.DataPipe( ... + 'data', test_data, ... + 'chunkSize', [chunk_size, 32], ... + 'compressionLevel', comp_level); + + nwb = create_test_nwb(); + nwb.acquisition.set('test', create_timeseries(data_pipe)); + + filename = 'benchmark_temp.nwb'; + nwbExport(nwb, filename); + creation_time = toc; + + % Measure file size + file_info = dir(filename); + file_size_mb = file_info.bytes / 1e6; + + % Test read performance + tic; + test_nwb = nwbRead(filename); + sample_data = test_nwb.acquisition.get('test').data.load(1:1000, :); + read_time = toc; + + % Store results + new_row = table(data_size, chunk_size, comp_level, ... + creation_time, file_size_mb, read_time, ... + 'VariableNames', {'DataSize', 'ChunkSize', 'CompressionLevel', ... + 'CreationTime', 'FileSizeMB', 'ReadTime'}); + results = [results; new_row]; + + delete(filename); + end + end + end + + % Display results + disp(results); + + % Plot performance trends + figure; + scatter3(results.DataSize, results.CompressionLevel, results.CreationTime); + xlabel('Data Size'); ylabel('Compression Level'); zlabel('Creation Time (s)'); + title('NWB Creation Performance'); + end + +Best Practices Summary +---------------------- + +1. **Use DataPipe for all large datasets** (> 100 MB) +2. **Choose compression level 4-6** for most applications +3. **Align chunk sizes with your analysis patterns** +4. **Use appropriate numeric data types** to minimize memory usage +5. **Process in parallel at the file level**, not within files +6. **Benchmark your specific workflow** to identify bottlenecks +7. **Pre-allocate space** for datasets that will grow over time + +.. code-block:: MATLAB + + % Template for high-performance NWB creation + function create_optimized_nwb(raw_data_source, output_file) + % Determine optimal parameters for your data + data_info = analyze_data_characteristics(raw_data_source); + + optimal_chunk_size = calculate_optimal_chunks(data_info); + compression_level = 6; % Good default + + % Create DataPipe with optimized settings + data_pipe = types.untyped.DataPipe( ... + 'compressionLevel', compression_level, ... + 'chunkSize', optimal_chunk_size); + + % Build NWB structure efficiently + nwb = build_nwb_structure_fast(); + + % Add data and export + add_data_efficiently(nwb, data_pipe, raw_data_source); + nwbExport(nwb, output_file); + + % Validate performance + validate_file_performance(output_file); + end + +The next section covers best practices that tie together all these performance considerations with robust file creation workflows. diff --git a/docs/source/pages/concepts/file_create/troubleshooting.rst b/docs/source/pages/concepts/file_create/troubleshooting.rst new file mode 100644 index 000000000..db67f92cc --- /dev/null +++ b/docs/source/pages/concepts/file_create/troubleshooting.rst @@ -0,0 +1,475 @@ +Troubleshooting NWB File Creation +================================== + +This section addresses common issues encountered when creating NWB files and provides solutions for typical problems. Many issues stem from the underlying HDF5 format constraints and can be avoided with proper planning. + +Common Error Messages +--------------------- + +**"Required property not set" Errors:** + +.. code-block:: text + + Error: The property 'session_start_time' is required but has not been set. + +*Solution:* Ensure all required NwbFile properties are set before export: + +.. code-block:: MATLAB + + % Fix: Set all required properties + nwb = NwbFile( ... + 'session_start_time', datetime('now', 'TimeZone', 'local'), ... + 'identifier', 'unique_session_id', ... + 'session_description', 'Description of experiment'); + +**"Cannot modify existing file" Errors:** + +.. code-block:: text + + Error: Unable to modify existing dataset in HDF5 file + +*Problem:* Attempting to change data structure after file creation. + +*Solution:* Recreate the file with the new structure: + +.. code-block:: MATLAB + + % Don't try to modify existing files like this: + % nwb = nwbRead('existing.nwb'); + % nwb.acquisition.set('new_data', new_dataset); % This may fail + + % Instead, create a new file: + old_nwb = nwbRead('existing.nwb'); + new_nwb = create_updated_nwb(old_nwb, new_data); + nwbExport(new_nwb, 'updated_file.nwb'); + +**Out of Memory Errors:** + +.. code-block:: text + + Error: Out of memory. Type "help memory" for your options. + +*Problem:* Trying to load datasets larger than available RAM. + +*Solution:* Use DataPipe for large datasets: + +.. code-block:: MATLAB + + % Don't load huge datasets directly: + % huge_data = load_entire_dataset(); % May exceed memory + % electrical_series = types.core.ElectricalSeries('data', huge_data, ...); + + % Instead, use DataPipe for efficient handling: + data_pipe = types.untyped.DataPipe( ... + 'data', initial_chunk, ... + 'maxSize', [total_samples, num_channels], ... + 'compressionLevel', 6); + + electrical_series = types.core.ElectricalSeries('data', data_pipe, ...); + +File Corruption Issues +---------------------- + +**Symptoms of Corrupted Files:** + +- File cannot be opened by nwbRead +- Incomplete data when reading +- Error messages about invalid HDF5 structure +- File size is much smaller than expected + +**Prevention:** + +.. code-block:: MATLAB + + function safe_nwb_export(nwb, filename) + temp_filename = [filename, '.tmp']; + + try + % Export to temporary file first + nwbExport(nwb, temp_filename); + + % Verify the file can be read + test_nwb = nwbRead(temp_filename); + clear test_nwb; % Release file handle + + % If successful, move to final location + if exist(filename, 'file') + backup_filename = [filename, '.backup']; + movefile(filename, backup_filename); + end + movefile(temp_filename, filename); + + fprintf('File exported successfully: %s\n', filename); + + catch ME + % Clean up on failure + if exist(temp_filename, 'file') + delete(temp_filename); + end + + fprintf('Export failed: %s\n', ME.message); + rethrow(ME); + end + end + +**Recovery from Corruption:** + +.. code-block:: MATLAB + + function recovered_data = recover_from_corrupted_nwb(corrupted_file) + try + % Try to read whatever is accessible + nwb = nwbRead(corrupted_file); + + % Extract data that's still readable + recovered_data = struct(); + + % Try to recover metadata + try + recovered_data.session_start_time = nwb.session_start_time; + recovered_data.identifier = nwb.identifier; + recovered_data.session_description = nwb.session_description; + catch + warning('Could not recover basic metadata'); + end + + % Try to recover acquisition data + try + acquisition_keys = nwb.acquisition.keys(); + for key = acquisition_keys + try + data_obj = nwb.acquisition.get(key{1}); + recovered_data.acquisition.(key{1}) = data_obj; + catch + warning('Could not recover acquisition data: %s', key{1}); + end + end + catch + warning('Could not access acquisition data'); + end + + catch ME + error('File is too corrupted to recover: %s', ME.message); + end + end + +Performance Problems +-------------------- + +**File Creation Takes Too Long:** + +*Symptoms:* Export process runs for hours or appears to hang. + +*Causes and Solutions:* + +1. **Large uncompressed datasets:** + +.. code-block:: MATLAB + + % Problem: No compression + data_pipe = types.untyped.DataPipe('data', large_data); + + % Solution: Add compression + data_pipe = types.untyped.DataPipe( ... + 'data', large_data, ... + 'compressionLevel', 6); + +2. **Poor chunking strategy:** + +.. code-block:: MATLAB + + % Problem: Inappropriate chunk size + data_pipe = types.untyped.DataPipe( ... + 'chunkSize', [1, num_channels]); % Too small chunks + + % Solution: Better chunk size + data_pipe = types.untyped.DataPipe( ... + 'chunkSize', [1000, num_channels]); % Larger, more efficient chunks + +3. **Excessive memory allocation:** + +.. code-block:: MATLAB + + % Problem: Loading all data at once + all_data = load_entire_experiment(); + + % Solution: Process in chunks + chunk_size = 30000; % 1 second at 30kHz + for chunk_start = 1:chunk_size:total_samples + chunk_end = min(chunk_start + chunk_size - 1, total_samples); + chunk_data = load_data_chunk(chunk_start, chunk_end); + append_to_nwb(nwb, chunk_data); + end + +**Files Are Too Large:** + +*Problem:* NWB files much larger than source data. + +*Solutions:* + +1. **Increase compression:** + +.. code-block:: MATLAB + + % Try higher compression levels + data_pipe = types.untyped.DataPipe( ... + 'compressionLevel', 9); % Maximum compression + +2. **Use appropriate data types:** + +.. code-block:: MATLAB + + % Convert to smaller data types if possible + if max(data(:)) < 32767 && min(data(:)) > -32768 + compressed_data = int16(data); % Use 16-bit instead of 64-bit + end + +3. **Remove unnecessary precision:** + +.. code-block:: MATLAB + + % Round data to remove artificial precision + rounded_data = round(data * 100) / 100; % Keep 2 decimal places + +Schema and Structure Issues +--------------------------- + +**"Invalid schema" Errors:** + +*Problem:* Data doesn't match expected NWB structure. + +*Common causes:* + +1. **Incorrect data dimensions:** + +.. code-block:: MATLAB + + % Problem: Wrong dimension order + electrical_series = types.core.ElectricalSeries( ... + 'data', data); % data should be [time x channels], not [channels x time] + + % Solution: Transpose if necessary + if size(data, 1) < size(data, 2) % More channels than timepoints is suspicious + data = data'; % Transpose to [time x channels] + end + +2. **Missing linked objects:** + +.. code-block:: MATLAB + + % Problem: Reference to non-existent object + electrical_series = types.core.ElectricalSeries( ... + 'electrodes', electrode_region, ... % electrode_region not properly created + 'data', data); + + % Solution: Ensure all linked objects exist + electrode_table = create_electrode_table(electrode_info); + electrode_region = types.hdmf_common.DynamicTableRegion( ... + 'table', types.untyped.ObjectView(electrode_table), ... + 'data', electrode_indices); + +**Inconsistent Units or Timestamps:** + +.. code-block:: MATLAB + + function validate_temporal_consistency(nwb) + % Check that all time series use consistent time base + + timeseries_objects = find_all_timeseries(nwb); + reference_time = nwb.timestamps_reference_time; + + for ts = timeseries_objects + if ~isempty(ts.starting_time) + % Check starting time is reasonable + if ts.starting_time < 0 + warning('Negative starting time detected: %.3f', ts.starting_time); + end + end + + if ~isempty(ts.timestamps) + % Check timestamp consistency + timestamps = ts.timestamps.load(); + if any(diff(timestamps) <= 0) + warning('Non-monotonic timestamps detected'); + end + end + end + end + +Data Type and Format Issues +--------------------------- + +**Complex Number Handling:** + +.. code-block:: text + + Error: Complex data types not supported in NWB files + +*Problem:* Trying to store complex-valued data directly. + +*Solution:* Split into real and imaginary parts: + +.. code-block:: MATLAB + + % Problem: Complex data + % complex_data = fft(signal); % Results in complex numbers + + % Solution: Store real and imaginary separately + fft_result = fft(signal); + real_part = real(fft_result); + imag_part = imag(fft_result); + + % Store as separate time series + nwb.processing.get('spectral_analysis').nwbdatainterface.set('fft_real', ... + create_timeseries(real_part, 'Real part of FFT')); + nwb.processing.get('spectral_analysis').nwbdatainterface.set('fft_imag', ... + create_timeseries(imag_part, 'Imaginary part of FFT')); + +**String and Text Data:** + +.. code-block:: MATLAB + + % Ensure text data is properly formatted + if iscell(text_data) + % Convert cell array to character array if needed + text_data = char(text_data); + end + + % Handle special characters + text_data = strrep(text_data, char(0), ''); % Remove null characters + +Debugging Workflow +------------------ + +**Step-by-Step Debugging:** + +1. **Test with minimal data:** + +.. code-block:: MATLAB + + function debug_nwb_creation() + % Start with absolute minimum + nwb = NwbFile( ... + 'session_start_time', datetime('now', 'TimeZone', 'local'), ... + 'identifier', 'debug_test', ... + 'session_description', 'Debugging test'); + + % Export and test + nwbExport(nwb, 'debug_minimal.nwb'); + test_nwb = nwbRead('debug_minimal.nwb'); + + % Add components one by one + nwb.acquisition.set('test_data', create_minimal_timeseries()); + nwbExport(nwb, 'debug_with_data.nwb'); + + % Continue adding complexity until error occurs + end + +2. **Use verbose error reporting:** + +.. code-block:: MATLAB + + try + nwbExport(nwb, filename); + catch ME + fprintf('Error during export:\n'); + fprintf('Message: %s\n', ME.message); + fprintf('Stack trace:\n'); + for i = 1:length(ME.stack) + fprintf(' %s (line %d)\n', ME.stack(i).name, ME.stack(i).line); + end + + % Try to get more specific information + if contains(ME.message, 'HDF5') + fprintf('This appears to be an HDF5-related error\n'); + fprintf('Consider checking data types and file permissions\n'); + end + end + +**Diagnostic Tools:** + +.. code-block:: MATLAB + + function diagnose_nwb_problems(nwb) + % Comprehensive diagnostic function + + fprintf('=== NWB Diagnostic Report ===\n'); + + % Check required fields + required_fields = {'session_start_time', 'identifier', 'session_description'}; + for field = required_fields + if isempty(nwb.(field{1})) + fprintf('ERROR: Required field %s is empty\n', field{1}); + else + fprintf('OK: %s = %s\n', field{1}, string(nwb.(field{1}))); + end + end + + % Check data sizes + acquisition_keys = nwb.acquisition.keys(); + for key = acquisition_keys + data_obj = nwb.acquisition.get(key{1}); + if isprop(data_obj, 'data') + data_size = size(data_obj.data); + fprintf('Data object %s: size = [%s]\n', key{1}, ... + strjoin(string(data_size), ' x ')); + + % Check for suspicious sizes + if any(data_size == 0) + fprintf('WARNING: Zero-sized dimension in %s\n', key{1}); + end + end + end + + % Memory usage estimate + memory_estimate = estimate_nwb_memory_usage(nwb); + fprintf('Estimated memory usage: %.2f MB\n', memory_estimate / 1e6); + end + +Getting Help +------------ + +**When to Seek Help:** + +- Error messages that aren't covered in this guide +- Performance issues that persist after optimization +- File corruption that can't be recovered +- Schema validation errors with unclear causes + +**Where to Get Help:** + +1. **MatNWB GitHub Issues:** https://github.com/NeurodataWithoutBorders/matnwb/issues +2. **NWB Community Forum:** https://community.nwb.org/ +3. **NWB Documentation:** https://nwb-overview.readthedocs.io/ + +**Information to Include When Reporting Issues:** + +.. code-block:: MATLAB + + function create_bug_report() + % Gather diagnostic information for bug reports + + fprintf('=== Bug Report Information ===\n'); + fprintf('MATLAB Version: %s\n', version); + fprintf('Operating System: %s\n', computer); + fprintf('MatNWB Version: %s\n', get_matnwb_version()); + + % Memory information + if ispc + [~, mem_info] = system('wmic computersystem get TotalPhysicalMemory /value'); + else + [~, mem_info] = system('free -h'); + end + fprintf('Memory Info: %s\n', mem_info); + + % Recent errors + fprintf('Recent errors in command window:\n'); + % Include error messages and stack traces + + fprintf('Data characteristics:\n'); + fprintf(' - Dataset sizes: [describe your data dimensions]\n'); + fprintf(' - Data types: [list data types you are using]\n'); + fprintf(' - Processing workflow: [describe your workflow]\n'); + end + +This troubleshooting guide should help you resolve most common issues. Remember that many problems can be prevented by following the best practices outlined in previous sections, particularly around HDF5 limitations and performance optimization. diff --git a/docs/source/pages/getting_started/file_read.rst b/docs/source/pages/concepts/file_read.rst similarity index 100% rename from docs/source/pages/getting_started/file_read.rst rename to docs/source/pages/concepts/file_read.rst diff --git a/docs/source/pages/getting_started/file_read/dynamictable.rst b/docs/source/pages/concepts/file_read/dynamictable.rst similarity index 100% rename from docs/source/pages/getting_started/file_read/dynamictable.rst rename to docs/source/pages/concepts/file_read/dynamictable.rst diff --git a/docs/source/pages/getting_started/file_read/nwbfile.rst b/docs/source/pages/concepts/file_read/nwbfile.rst similarity index 100% rename from docs/source/pages/getting_started/file_read/nwbfile.rst rename to docs/source/pages/concepts/file_read/nwbfile.rst diff --git a/docs/source/pages/getting_started/file_read/schemas_and_generation.rst b/docs/source/pages/concepts/file_read/schemas_and_generation.rst similarity index 89% rename from docs/source/pages/getting_started/file_read/schemas_and_generation.rst rename to docs/source/pages/concepts/file_read/schemas_and_generation.rst index 327371ed3..8ae512d40 100644 --- a/docs/source/pages/getting_started/file_read/schemas_and_generation.rst +++ b/docs/source/pages/concepts/file_read/schemas_and_generation.rst @@ -207,32 +207,12 @@ When you need to work with files from different NWB versions or with different e nwb_new = nwbRead('new_file_v2_7.nwb'); % ... work with new file ... -**Using Custom Save Directories:** - -For better organization and to avoid conflicts, generate classes in separate directories: - -.. code-block:: MATLAB - - % Generate classes for different versions in separate directories - generateCore('2.6.0', 'savedir', 'nwb_2_6_0_classes'); - generateCore('2.7.0', 'savedir', 'nwb_2_7_0_classes'); - - % Work with one version at a time - addpath('nwb_2_6_0_classes'); - nwb_old = nwbRead('old_file.nwb', 'ignorecache'); - % ... process old files ... - rmpath('nwb_2_6_0_classes'); - - % Switch to newer version - addpath('nwb_2_7_0_classes'); - nwb_new = nwbRead('new_file.nwb', 'ignorecache'); - Troubleshooting Schema Issues ----------------------------- **Version Conflicts:** -If you see errors about incompatible classes or missing properties: +If you see errors about incompatible classes or missing properties, clear the workspace variables and try to read the file again: .. code-block:: MATLAB @@ -260,21 +240,6 @@ If a file uses custom extensions you don't have: % Or generate the extension manually if you have the schema file generateExtension('/path/to/extension.namespace.yaml'); -**Class Path Issues:** - -If MATLAB can't find the generated classes: - -.. code-block:: MATLAB - - % Check if the types directory is on your path - which types.core.TimeSeries - - % Add the directory containing +types to your path - addpath('/path/to/directory/containing/types'); - - % Refresh MATLAB's function cache - rehash; - Best Practices -------------- diff --git a/docs/source/pages/getting_started/file_read/troubleshooting.rst b/docs/source/pages/concepts/file_read/troubleshooting.rst similarity index 98% rename from docs/source/pages/getting_started/file_read/troubleshooting.rst rename to docs/source/pages/concepts/file_read/troubleshooting.rst index cedff55b5..b514e2c7d 100644 --- a/docs/source/pages/getting_started/file_read/troubleshooting.rst +++ b/docs/source/pages/concepts/file_read/troubleshooting.rst @@ -31,8 +31,8 @@ To do this, you can use the optional ``savedir`` keyword argument with ``nwbRead .. _matnwb-read-troubleshooting-missing-schema: -Missing Embedded Schemata -~~~~~~~~~~~~~~~~~~~~~~~~~ +Missing Embedded Schemas +~~~~~~~~~~~~~~~~~~~~~~~~ Some older NWB files do not have an embedded schema. To read from these files you will need the API generation functions ``generateCore`` and ``generateExtension`` to generate the class files before calling ``nwbRead`` on them. You can also use the utility function ``util.getSchemaVersion`` to retrieve the correct Core schema for the file you are trying to read: diff --git a/docs/source/pages/getting_started/file_read/untyped.rst b/docs/source/pages/concepts/file_read/untyped.rst similarity index 100% rename from docs/source/pages/getting_started/file_read/untyped.rst rename to docs/source/pages/concepts/file_read/untyped.rst diff --git a/docs/source/pages/getting_started/using_extenstions.rst b/docs/source/pages/concepts/using_extensions.rst similarity index 74% rename from docs/source/pages/getting_started/using_extenstions.rst rename to docs/source/pages/concepts/using_extensions.rst index c15ce343d..c2a951cb6 100644 --- a/docs/source/pages/getting_started/using_extenstions.rst +++ b/docs/source/pages/concepts/using_extensions.rst @@ -1,5 +1,5 @@ -Using Neurodata Extensions -========================== +Neurodata Extensions +==================== The `NWB Specification Language `_ can be used to create Neurodata Extensions (NDX), which extend the core NWB schemas @@ -8,10 +8,10 @@ that has specific metadata or data requirements not covered by the core NWB sche To learn more about extending NWB, see the :nwb_overview:`NWB Overview Documentation`, and for a list of published extensions, visit the `Neurodata Extension Catalog `_. -The following sections describe how to use extensions in MatNWB: +The following guides describe how to use extensions in MatNWB: .. toctree:: :maxdepth: 2 - using_extensions/generating_extension_api - using_extensions/installing_extensions + /pages/how_to/using_extensions/generating_extension_api + /pages/how_to/using_extensions/installing_extensions diff --git a/docs/source/pages/getting_started/overview_citing.rst b/docs/source/pages/getting_started/how_to_cite.rst similarity index 100% rename from docs/source/pages/getting_started/overview_citing.rst rename to docs/source/pages/getting_started/how_to_cite.rst diff --git a/docs/source/pages/getting_started/installation.rst b/docs/source/pages/getting_started/installation.rst new file mode 100644 index 000000000..44db9d68e --- /dev/null +++ b/docs/source/pages/getting_started/installation.rst @@ -0,0 +1,161 @@ +.. _installation: + +Installation +============ + +Quick install +------------- + +If you want the shortest path and have ``git`` available, run the following snippet in MATLAB. This clones into your current working directory, adds MatNWB to the path, and optionally persists the change: + +.. code-block:: matlab + + !git clone https://github.com/NeurodataWithoutBorders/matnwb.git + addpath("matnwb") + % Optional: persist for future MATLAB sessions + savepath() + +If you do not have git, prefer a stable release, or run into installation issues, please refer to the detailed guide below. + +Prerequisites +------------- + +- MATLAB R2019b or newer (we strive to support MATLAB releases from the past ~5 years). + +.. note:: + Dynamically loaded filters for dataset compression are supported only in MATLAB R2022a or later. + +Choose an installation method +----------------------------- + +Pick the method that best fits your workflow: + +- Method A: Clone from GitHub (development version) — Recommended +- Method B: MATLAB Add-On Manager +- Method C: Download a release ZIP (offline-friendly, stable) + +.. _method-a: + +Method A — Install the development version from GitHub (recommended) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. note:: + Requires `git `_. + +Use this if you want the latest changes or plan to contribute. This clones MatNWB into your current working directory. + +- From a shell/Terminal: + + .. code-block:: bash + + git clone https://github.com/NeurodataWithoutBorders/matnwb.git + +- Or directly from within MATLAB: + + .. code-block:: matlab + + !git clone https://github.com/NeurodataWithoutBorders/matnwb.git + addpath("matnwb") + % Optional: persist for future MATLAB sessions + savepath() + +.. _method-b: + +Method B — Install via MATLAB Add-On Manager +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +1. In MATLAB, go to: Home tab → Add-Ons → Get Add-Ons. +2. In Add-On Explorer, search for ``matnwb``. +3. Select “NeurodataWithoutBorders/matnwb”, then click “Add to MATLAB”. + +.. tip:: + If your organization blocks Add-On Explorer, use Method A or C instead. + +.. _method-c: + +Method C — Install from a release ZIP (offline-friendly, stable) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +1. Download the latest release ZIP from the `GitHub Releases `_ page. +2. Unzip into a permanent folder (e.g., ``C:\tools\matnwb`` on Windows or ``~/tools/matnwb`` on macOS/Linux). +3. Add MatNWB to your MATLAB path: + + .. code-block:: matlab + + addpath("path/to/matnwb") + +4. (Optional) Persist the change for future sessions: + + .. code-block:: matlab + + savepath() % May require write permissions to pathdef.m + + +Verify your installation +------------------------ + +Run this quick check in MATLAB to verify that MatNWB is installed: + +.. code-block:: matlab + + versionInfo = ver("matnwb") + +You should see a structure with MatNWB version information. + + +Update or uninstall +------------------- + +- Update (Add-On Manager): + + - MATLAB R2025a and later: + + - Home → Add-Ons → Manage Add-Ons → Find “matnwb” → Update (if available). + + - Before MATLAB R2025a: + + - Uninstall your current version and reinstall a newer version. + +- Update (Git): + + .. code-block:: matlab + + cd path/to/matnwb + !git pull + +- Uninstall: + - Remove the MatNWB folder and remove it from the MATLAB path: + + .. code-block:: matlab + + rmpath("path/to/matnwb") + savepath() % optional + + +Troubleshooting +--------------- + +MATLAB cannot find MatNWB functions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- Ensure the MatNWB folder is on the path (see “Verify your installation”). +- If needed, restart MATLAB after calling ``savepath()``. +- Use ``which nwbRead -all`` to diagnose duplicate or shadowed installs. + +Add-On Explorer blocked by network policy +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- Use Method A (Git clone) or Method C (release ZIP). + +Path persistence issues +~~~~~~~~~~~~~~~~~~~~~~~ + +- ``savepath`` may require write permissions to ``pathdef.m``. +- Run MATLAB as an administrator (Windows) or adjust permissions/create a user pathdef. + +Next steps +---------- + +- Read data with :func:`nwbRead` (see :doc:`/pages/concepts/file_read`). +- Review important data dimension notes: :doc:`/pages/concepts/considerations`. +- Explore tutorials: :doc:`../tutorials/index`. \ No newline at end of file diff --git a/docs/source/pages/getting_started/installation_users.rst b/docs/source/pages/getting_started/installation_users.rst deleted file mode 100644 index 9c7ad7180..000000000 --- a/docs/source/pages/getting_started/installation_users.rst +++ /dev/null @@ -1,25 +0,0 @@ -Install MatNWB -============== - -Download the current release of MatNWB from the -`MatNWB releases page `_ -or from the `MATLAB's FileExchange `_. -You can also check out the latest development version via:: - - git clone https://github.com/NeurodataWithoutBorders/matnwb.git - -After downloading MatNWB, make sure to add it to MATLAB's search path: - -.. code-block:: matlab - - addpath("path/to/matnwb") - savepath() % Permanently add to search path - -Requirements ------------- -MatNWB requires MATLAB R2019b or newer. As a general rule, we strive to maintain -compatibility with MATLAB releases from the past five years. - -**Known exceptions**: - -* Dynamically loaded filters for dataset compression are supported only in MATLAB R2022a or later. \ No newline at end of file diff --git a/docs/source/pages/getting_started/overview.rst b/docs/source/pages/getting_started/overview.rst new file mode 100644 index 000000000..fc01decad --- /dev/null +++ b/docs/source/pages/getting_started/overview.rst @@ -0,0 +1,114 @@ +.. include:: _links.rst + +.. _matnwb-overview: + +Overview +======== + +What is MatNWB? +--------------- + +MatNWB is a MATLAB package for reading, writing, and validating NWB files. It provides simple functions like :func:`nwbRead` and :func:`nwbExport` for file I/O, as well as a complete set of core neurodata and helper types represented using MATLAB classes. + + +Who is it for? +-------------- + +- MATLAB users working with neurophysiology and related data (extracellular and intracellular electrophysiology, optical physiology, behavior, images, and derived analyses) +- Labs that want a reproducible, self-describing data format that travels well across tools, languages, and archives (e.g., DANDI) + +What you can do with MatNWB +--------------------------- + +- Read NWB files + + - One call to :doc:`nwbRead ` opens a file and presents a hierarchical representation of the complete file and its contents. + - Lazy I/O via DataStub lets you slice large datasets without loading them into RAM. + +- Write NWB files + + - Build an :doc:`NwbFile ` with standard neurodata types (e.g., :doc:`TimeSeries `, :doc:`ElectricalSeries `, :doc:`Units `, :doc:`ImageSeries `). + - Export to disk with :doc:`nwbExport `. + +- Scale to large data + + - Stream/append and compress data with the DataPipe interface [Todo: DataPipe reference]. + - Use predefined or custom configuration profiles to optimize files for local storage, cloud storage or archiving. [Todo: How-to guide] + +- Use and create extensions + + - Install published Neurodata Extensions (NDX) with :doc:`nwbInstallExtension ` + - Generate classes from any namespace specification with :doc:`generateExtension `. + +How it works (the mental model) +------------------------------- + +NWB files are containers for storing data and metadata in a hierarchical manner using groups and datasets. In this sense, an NWB file can be thought of as a tree of folders and files representing all the data associated with neurophysiological recording sessions. The data and metadata is represented through a set of neurodata types defined by the NWB schema. These neurodata types are the building blocks for NWB files and are often used together in specific configurations (see the :doc:`tutorials ` for concrete patterns) + +MatNWB generates MATLAB classes representing these neurodata types from the NWB core schema or any available neurodata extension. These neurodata type classes ensure that data is always conforming to the NWB specification, and provide a structured interface for reading, writing, and validating NWB files. When you read an NWB file, MatNWB maps each group and dataset in the file to the corresponding MATLAB class, so you interact with neurodata types directly in MATLAB code. When you write or export, MatNWB serializes your MATLAB objects back to NWB-compliant HDF5 files, preserving the schema and relationships between types. + +The main categories of types you will work with + +- Metadata: subject and session descriptors (e.g., :doc:`Subject `, :doc:`NWBFile `, :doc:`Device `). +- Containers/wrappers: organize related data (e.g., :doc:`ProcessingModule `). +- Time series: sampled data over time (e.g., :doc:`TimeSeries `, :doc:`ElectricalSeries `). +- Tables: columnar metadata or data (e.g., :doc:`DynamicTable `). +- Helpers: Helper types [Todo: expand, and link to helper types concept page]. + +Common questions you may encounter (and where to find answers) +-------------------------------------------------------------- + +- Which data type should I use? + + - Check out the :nwb_overview:`Neurodata Types ` section in the NWB Overview Docs + - Refer to the :doc:`Core neurodata types index ` for the full list of MatNWB types. + +- Where in the file should a type go? + + - Check out the section :nwb_overview:`Anatomy of an NWB file ` in the NWB Overview Docs + - Follow the domain tutorials for canonical placements (e.g., :doc:`Extracellular ephys `, :doc:`Calcium imaging `, :doc:`Intracellular ephys `). + +- How do I name neurodata types when adding to sets? + + - Refer to the :nwbinspector:`Naming Conventions ` section of the NWB Inspector docs. + +- What properties are required and how do I set them? + + - Each class page lists required fields and their types (e.g., :doc:`TimeSeries `). + - Refer to the :nwbinspector:`Best Practices ` for more detailed recommendations. + +- How do I add lab‑specific data? + + - Use :doc:`Neurodata Extensions ` to install published NDX or to generate classes from your own namespace. + + +Important considerations when working with MatNWB: +-------------------------------------------------- + +- **MATLAB vs. NWB dimension order** : The dimensions of datasets (arrays) in MatNWB are represented in the opposite order relative to the NWB specification. For example, in NWB the time dimension of a TimeSeries is the first dimension of a dataset, whereas in MatNWB, it will be the last dimension of the dataset. See the mappings and examples in the :doc:`Data dimensions ` section for a detailed explanation. + +- **NWB schema version conflicts**: When reading NWB files, MatNWB will dynamically build classes for neurodata types from schemas that are embedded in the file. This ensures that the file is always represented correctly according to the schema version (and extensions) that was used when creating the file. However, the generated types will take the place of previously existing types (i.e from different versions), and therefore it is not recommended to work with NWB files of different versions simultaneously. + +- **Editing NWB files**: NWB files are stored using the HDF5 standard. This presents some difficulties in editing or appending data to files. See the section on :ref:`HDF5 considerations ` for more details. + + +Learn more (no steps here—just pointers) +---------------------------------------- + +- Object‑oriented programming refresher (MATLAB): https://www.mathworks.com/help/matlab/object-oriented-programming.html + +Cite MatNWB +----------- + +If MatNWB contributes to your work, please see :doc:`Citing MatNWB `. + +Related resources +----------------- + +- :nwb_overview:`NWB Overview <>` documentation +- Python API (PyNWB_) +- Share/discover data: :dandi:`DANDI Archive <>` + +.. note:: + + This page is an overview (explanation). A separate quickstart covers first read/write steps; see the :doc:`tutorials ` and Getting Started pages for hands‑on material. \ No newline at end of file diff --git a/docs/source/pages/getting_started/quickstart.rst b/docs/source/pages/getting_started/quickstart.rst new file mode 100644 index 000000000..fc43ae109 --- /dev/null +++ b/docs/source/pages/getting_started/quickstart.rst @@ -0,0 +1,96 @@ +.. _quickstart-tutorial: + +Quickstart: Read and write NWB files +==================================== + + +Goal +---- + +This tutorial walks you step-by-step through creating, writing, and reading a minimal NWB file with MatNWB. It is designed to be a short, learning-oriented introduction. + + +Prerequisites +------------- + +- MATLAB R2019b or later +- MatNWB :ref:`installed` and added to your MATLAB path + + +Step 1 — Create a minimal NWB file +---------------------------------- + +An NWB file always needs three required fields: + +- ``identifier`` (unique ID) +- ``session_description`` (short text summary) +- ``session_start_time`` (timestamp of the session start) + +.. code-block:: matlab + + nwb = NwbFile( ... + 'identifier', 'quickstart-demo-20250411T153000Z', ... + 'session_description', 'Quickstart demo session', ... + 'session_start_time', datetime(2025,4,11,15,30,0,'TimeZone','UTC')); + + +Step 2 — Add a TimeSeries +------------------------- + +We’ll add a short synthetic signal sampled at 10 Hz for 1 second. + +.. code-block:: matlab + + t = 0:0.1:0.9; % 10 time points + data = sin(2*pi*1*t); % simple sine wave + + ts = types.core.TimeSeries( ... + 'data', data, ... + 'data_unit', 'arbitrary', ... + 'starting_time', 0.0, ... + 'starting_time_rate', 10.0); + + nwb.acquisition.set('DemoSignal', ts); + +.. note:: + MatNWB uses MATLAB array ordering when writing to HDF5. For multi-dimensional time series, the time dimension should be the last dimension in MATLAB. See :doc:`/pages/concepts/considerations for details. + + +Step 3 — Write the File +----------------------- + +.. code-block:: matlab + + nwbExport(nwb, 'quickstart_demo.nwb', 'owerwrite'); + +This writes the NWB file to your current working directory. + +Step 4 — Read the File Back +--------------------------- + +.. code-block:: matlab + + nwb_in = nwbRead('quickstart_demo.nwb'); + +Confirm that the ``DemoSignal`` was written and read back: + +.. code-block:: matlab + + ts_in = nwb_in.acquisition.get('DemoSignal'); + + % Data is a DataStub (lazy loading). Index like an array or load fully: + first_five = ts_in.data(1:5); % reads a slice + all_data = ts_in.data.load(); % reads all values + + +That’s it! +---------- + +You have written and read an NWB file with MatNWB. + +Next steps +---------- + +- Try the :doc:`Introduction Tutorial <../tutorials/intro>` for a full example with subject metadata, events, and processed data. +- Learn how to read more complex files: :doc:`Reading files with MatNWB <../tutorials/read_demo>`. +- Explore the `MatNWB API reference `_. diff --git a/docs/source/pages/how_to/index.rst b/docs/source/pages/how_to/index.rst new file mode 100644 index 000000000..2fb97c024 --- /dev/null +++ b/docs/source/pages/how_to/index.rst @@ -0,0 +1,7 @@ +Use Extensions +============== +.. toctree:: + :maxdepth: 1 + + using_extensions/generating_extension_api + using_extensions/installing_extensions diff --git a/docs/source/pages/getting_started/using_extensions/generating_extension_api.rst b/docs/source/pages/how_to/using_extensions/generating_extension_api.rst similarity index 100% rename from docs/source/pages/getting_started/using_extensions/generating_extension_api.rst rename to docs/source/pages/how_to/using_extensions/generating_extension_api.rst diff --git a/docs/source/pages/getting_started/using_extensions/installing_extensions.rst b/docs/source/pages/how_to/using_extensions/installing_extensions.rst similarity index 100% rename from docs/source/pages/getting_started/using_extensions/installing_extensions.rst rename to docs/source/pages/how_to/using_extensions/installing_extensions.rst diff --git a/docs/source/pages/tutorials/basicUsage.rst b/docs/source/pages/tutorials/basicUsage.rst index 066237109..b0d1401e7 100644 --- a/docs/source/pages/tutorials/basicUsage.rst +++ b/docs/source/pages/tutorials/basicUsage.rst @@ -1,3 +1,5 @@ +.. _basicUsage-tutorial: + Basic Usage of MatNWB ===================== diff --git a/docs/source/pages/tutorials/behavior.rst b/docs/source/pages/tutorials/behavior.rst index cc39da747..a8d7196f7 100644 --- a/docs/source/pages/tutorials/behavior.rst +++ b/docs/source/pages/tutorials/behavior.rst @@ -1,3 +1,5 @@ +.. _behavior-tutorial: + Behavior Data ============= diff --git a/docs/source/pages/tutorials/convertTrials.rst b/docs/source/pages/tutorials/convertTrials.rst index 13f80e9d7..be8fea57a 100644 --- a/docs/source/pages/tutorials/convertTrials.rst +++ b/docs/source/pages/tutorials/convertTrials.rst @@ -1,3 +1,5 @@ +.. _convertTrials-tutorial: + Converting Trials to NWB Format =============================== diff --git a/docs/source/pages/tutorials/dataPipe.rst b/docs/source/pages/tutorials/dataPipe.rst index e51c50ef5..7757e9632 100644 --- a/docs/source/pages/tutorials/dataPipe.rst +++ b/docs/source/pages/tutorials/dataPipe.rst @@ -1,3 +1,5 @@ +.. _dataPipe-tutorial: + Advanced Writing Using DataPipes 🎬 =================================== diff --git a/docs/source/pages/tutorials/dimensionMapNoDataPipes.rst b/docs/source/pages/tutorials/dimensionMapNoDataPipes.rst index d69bce1f6..b47d6539c 100644 --- a/docs/source/pages/tutorials/dimensionMapNoDataPipes.rst +++ b/docs/source/pages/tutorials/dimensionMapNoDataPipes.rst @@ -1,3 +1,5 @@ +.. _dimensionMapNoDataPipes-tutorial: + Mapping Dimensions without DataPipes ==================================== diff --git a/docs/source/pages/tutorials/dimensionMapWithDataPipes.rst b/docs/source/pages/tutorials/dimensionMapWithDataPipes.rst index 627888170..14d1e907b 100644 --- a/docs/source/pages/tutorials/dimensionMapWithDataPipes.rst +++ b/docs/source/pages/tutorials/dimensionMapWithDataPipes.rst @@ -1,3 +1,5 @@ +.. _dimensionMapWithDataPipes-tutorial: + Mapping Dimensions with DataPipes ================================= diff --git a/docs/source/pages/tutorials/dynamic_tables.rst b/docs/source/pages/tutorials/dynamic_tables.rst index f719c0f97..7a623afce 100644 --- a/docs/source/pages/tutorials/dynamic_tables.rst +++ b/docs/source/pages/tutorials/dynamic_tables.rst @@ -1,3 +1,5 @@ +.. _dynamic_tables-tutorial: + Using Dynamic Tables in MatNWB ============================== diff --git a/docs/source/pages/tutorials/dynamically_loaded_filters.rst b/docs/source/pages/tutorials/dynamically_loaded_filters.rst index f8e81746d..68781d970 100644 --- a/docs/source/pages/tutorials/dynamically_loaded_filters.rst +++ b/docs/source/pages/tutorials/dynamically_loaded_filters.rst @@ -1,3 +1,5 @@ +.. _dynamically_loaded_filters-tutorial: + Implementing Dynamically Loaded Filters ======================================= diff --git a/docs/source/pages/tutorials/ecephys.rst b/docs/source/pages/tutorials/ecephys.rst index fc21321ca..a9c596498 100644 --- a/docs/source/pages/tutorials/ecephys.rst +++ b/docs/source/pages/tutorials/ecephys.rst @@ -1,3 +1,5 @@ +.. _ecephys-tutorial: + Extracellular Electrophysiology 🎬 ================================== diff --git a/docs/source/pages/tutorials/icephys.rst b/docs/source/pages/tutorials/icephys.rst index 3d6ff227c..a834b1462 100644 --- a/docs/source/pages/tutorials/icephys.rst +++ b/docs/source/pages/tutorials/icephys.rst @@ -1,3 +1,5 @@ +.. _icephys-tutorial: + Intracellular Electrophysiology =============================== diff --git a/docs/source/pages/tutorials/images.rst b/docs/source/pages/tutorials/images.rst index 1b4e5a75c..a6a60c948 100644 --- a/docs/source/pages/tutorials/images.rst +++ b/docs/source/pages/tutorials/images.rst @@ -1,3 +1,5 @@ +.. _images-tutorial: + Image Data ========== diff --git a/docs/source/pages/tutorials/index.rst b/docs/source/pages/tutorials/index.rst index 04058392f..6738780f2 100644 --- a/docs/source/pages/tutorials/index.rst +++ b/docs/source/pages/tutorials/index.rst @@ -1,8 +1,5 @@ -Tutorials -========= - General Tutorials ------------------ +================= .. toctree:: :maxdepth: 1 @@ -17,7 +14,7 @@ General Tutorials scratch Domain-Specific Tutorials -------------------------- +========================= .. toctree:: :maxdepth: 1 @@ -29,7 +26,7 @@ Domain-Specific Tutorials ophys Advanced I/O ------------- +============ .. toctree:: :maxdepth: 1 diff --git a/docs/source/pages/tutorials/intro.rst b/docs/source/pages/tutorials/intro.rst index fe87f2816..76dfea7d1 100644 --- a/docs/source/pages/tutorials/intro.rst +++ b/docs/source/pages/tutorials/intro.rst @@ -1,3 +1,5 @@ +.. _intro-tutorial: + Getting Started with MatNWB =========================== diff --git a/docs/source/pages/tutorials/ogen.rst b/docs/source/pages/tutorials/ogen.rst index 90ac03452..6fc05d849 100644 --- a/docs/source/pages/tutorials/ogen.rst +++ b/docs/source/pages/tutorials/ogen.rst @@ -1,3 +1,5 @@ +.. _ogen-tutorial: + Optogenetics ============ diff --git a/docs/source/pages/tutorials/ophys.rst b/docs/source/pages/tutorials/ophys.rst index fe06b829a..9be6b160a 100644 --- a/docs/source/pages/tutorials/ophys.rst +++ b/docs/source/pages/tutorials/ophys.rst @@ -1,3 +1,5 @@ +.. _ophys-tutorial: + Calcium Imaging 🎬 ================== diff --git a/docs/source/pages/tutorials/read_demo.rst b/docs/source/pages/tutorials/read_demo.rst index a493b2a7a..8c170ad31 100644 --- a/docs/source/pages/tutorials/read_demo.rst +++ b/docs/source/pages/tutorials/read_demo.rst @@ -1,3 +1,5 @@ +.. _read_demo-tutorial: + Reading NWB Files with MatNWB ============================= diff --git a/docs/source/pages/tutorials/read_demo_dandihub.rst b/docs/source/pages/tutorials/read_demo_dandihub.rst index 9f8e788a1..4405c1b2c 100644 --- a/docs/source/pages/tutorials/read_demo_dandihub.rst +++ b/docs/source/pages/tutorials/read_demo_dandihub.rst @@ -1,3 +1,5 @@ +.. _read_demo_dandihub-tutorial: + Reading NWB Files with MatNWB on DandiHub ========================================= diff --git a/docs/source/pages/tutorials/remote_read.rst b/docs/source/pages/tutorials/remote_read.rst index e4c52b5d9..3d731abc9 100644 --- a/docs/source/pages/tutorials/remote_read.rst +++ b/docs/source/pages/tutorials/remote_read.rst @@ -1,3 +1,5 @@ +.. _remote_read-tutorial: + Reading NWB Files from Remote Locations ======================================= diff --git a/docs/source/pages/tutorials/scratch.rst b/docs/source/pages/tutorials/scratch.rst index 3c1fa861b..43494f699 100644 --- a/docs/source/pages/tutorials/scratch.rst +++ b/docs/source/pages/tutorials/scratch.rst @@ -1,3 +1,5 @@ +.. _scratch-tutorial: + Working with Scratch Space in MatNWB ==================================== diff --git a/tools/documentation/_rst_templates/tutorial.rst.template b/tools/documentation/_rst_templates/tutorial.rst.template index 938de7ad6..f81488838 100644 --- a/tools/documentation/_rst_templates/tutorial.rst.template +++ b/tools/documentation/_rst_templates/tutorial.rst.template @@ -1,3 +1,5 @@ +.. _{{tutorial_name}}-tutorial: + {{tutorial_title}} {{tutorial_title_underline}} From 78f58829e80b8e0ff2b8d51c3af2c59614749202 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 4 Sep 2025 15:13:05 +0200 Subject: [PATCH 08/67] Fix links --- docs/source/_links.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/_links.rst b/docs/source/_links.rst index 484610b12..fad4076ef 100644 --- a/docs/source/_links.rst +++ b/docs/source/_links.rst @@ -1,5 +1,5 @@ -.. _MatNWB: https://www.python.org/ -.. _PyNWB: https://numpy.org/ +.. _MatNWB: https://github.com/NeurodataWithoutBorders/matnwb +.. _PyNWB: https://github.com/NeurodataWithoutBorders/pynwb .. _NWB: https://nwb.org .. |NWB| replace:: Neurodata Without Borders From 0aa79ce20ef94296d35266a07d1f8fe52895049a Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 4 Sep 2025 17:58:43 +0200 Subject: [PATCH 09/67] smaller rewordings --- docs/source/index.rst | 2 +- docs/source/pages/concepts/file_create/nwbfile.rst | 2 ++ docs/source/pages/getting_started/installation.rst | 9 ++++----- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index 683858ae6..773c1ce12 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -5,7 +5,7 @@ NWB for MATLAB ############## MatNWB_ is a MATLAB package for working with |NWB|_ (NWB) files. -It provides a high‑level, efficient interface for reading and writing neurophysiology data in the NWB format and includes tutorial Live Scripts that guide you through converting and organizing your own data. +It provides a high‑level, efficient interface for reading and writing neurophysiology data in the NWB format and includes tutorial Live Scripts that show you how to read existing NWB files or convert your own data to NWB. This documentation focuses on MatNWB. If you are new to NWB or want to learn more about the format itself, these resources are a great starting point: diff --git a/docs/source/pages/concepts/file_create/nwbfile.rst b/docs/source/pages/concepts/file_create/nwbfile.rst index 65fb86fca..be5ba613f 100644 --- a/docs/source/pages/concepts/file_create/nwbfile.rst +++ b/docs/source/pages/concepts/file_create/nwbfile.rst @@ -123,6 +123,8 @@ NWB files store all timestamps in a standardized format. Always specify the time The ``timestamps_reference_time`` field defines "time zero" for all timestamps in the file. This is typically set to match ``session_start_time``, but can be different if needed for your experimental design. +See also the :nwbinspector:`Best Practices ` section of the NWB Inspector documentation for details on setting the ``session_start_time``. + Validation ---------- diff --git a/docs/source/pages/getting_started/installation.rst b/docs/source/pages/getting_started/installation.rst index 44db9d68e..d72300e85 100644 --- a/docs/source/pages/getting_started/installation.rst +++ b/docs/source/pages/getting_started/installation.rst @@ -123,13 +123,12 @@ Update or uninstall cd path/to/matnwb !git pull -- Uninstall: - - Remove the MatNWB folder and remove it from the MATLAB path: +- Uninstall (Remove the MatNWB folder and remove it from the MATLAB path): - .. code-block:: matlab + .. code-block:: matlab - rmpath("path/to/matnwb") - savepath() % optional + rmpath("path/to/matnwb") + savepath() % optional Troubleshooting From 2923c40d898432ce8167caf2b2abaf1b6cf5029d Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 4 Sep 2025 17:59:05 +0200 Subject: [PATCH 10/67] Update overview.rst Fixed references/links --- docs/source/pages/getting_started/overview.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/pages/getting_started/overview.rst b/docs/source/pages/getting_started/overview.rst index fc01decad..f4b8387b5 100644 --- a/docs/source/pages/getting_started/overview.rst +++ b/docs/source/pages/getting_started/overview.rst @@ -1,4 +1,4 @@ -.. include:: _links.rst +.. include:: /_links.rst .. _matnwb-overview: @@ -8,7 +8,7 @@ Overview What is MatNWB? --------------- -MatNWB is a MATLAB package for reading, writing, and validating NWB files. It provides simple functions like :func:`nwbRead` and :func:`nwbExport` for file I/O, as well as a complete set of core neurodata and helper types represented using MATLAB classes. +MatNWB_ is a MATLAB package for reading, writing, and validating NWB files. It provides simple functions like :func:`nwbRead` and :func:`nwbExport` for file I/O, as well as a complete set of core neurodata and helper types represented using MATLAB classes. Who is it for? From b7a46091fcdbb76fe196716f4f4af7cf03d114ab Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 4 Sep 2025 19:08:57 +0200 Subject: [PATCH 11/67] Removed troubleshooting section llm generated, too generic and verbose --- docs/source/pages/concepts/file_create.rst | 1 - .../concepts/file_create/troubleshooting.rst | 475 ------------------ 2 files changed, 476 deletions(-) delete mode 100644 docs/source/pages/concepts/file_create/troubleshooting.rst diff --git a/docs/source/pages/concepts/file_create.rst b/docs/source/pages/concepts/file_create.rst index f99830289..367c5b588 100644 --- a/docs/source/pages/concepts/file_create.rst +++ b/docs/source/pages/concepts/file_create.rst @@ -58,4 +58,3 @@ The following pages provide detailed information on specific aspects of creating file_create/data_organization file_create/hdf5_considerations file_create/performance_optimization - file_create/troubleshooting diff --git a/docs/source/pages/concepts/file_create/troubleshooting.rst b/docs/source/pages/concepts/file_create/troubleshooting.rst deleted file mode 100644 index db67f92cc..000000000 --- a/docs/source/pages/concepts/file_create/troubleshooting.rst +++ /dev/null @@ -1,475 +0,0 @@ -Troubleshooting NWB File Creation -================================== - -This section addresses common issues encountered when creating NWB files and provides solutions for typical problems. Many issues stem from the underlying HDF5 format constraints and can be avoided with proper planning. - -Common Error Messages ---------------------- - -**"Required property not set" Errors:** - -.. code-block:: text - - Error: The property 'session_start_time' is required but has not been set. - -*Solution:* Ensure all required NwbFile properties are set before export: - -.. code-block:: MATLAB - - % Fix: Set all required properties - nwb = NwbFile( ... - 'session_start_time', datetime('now', 'TimeZone', 'local'), ... - 'identifier', 'unique_session_id', ... - 'session_description', 'Description of experiment'); - -**"Cannot modify existing file" Errors:** - -.. code-block:: text - - Error: Unable to modify existing dataset in HDF5 file - -*Problem:* Attempting to change data structure after file creation. - -*Solution:* Recreate the file with the new structure: - -.. code-block:: MATLAB - - % Don't try to modify existing files like this: - % nwb = nwbRead('existing.nwb'); - % nwb.acquisition.set('new_data', new_dataset); % This may fail - - % Instead, create a new file: - old_nwb = nwbRead('existing.nwb'); - new_nwb = create_updated_nwb(old_nwb, new_data); - nwbExport(new_nwb, 'updated_file.nwb'); - -**Out of Memory Errors:** - -.. code-block:: text - - Error: Out of memory. Type "help memory" for your options. - -*Problem:* Trying to load datasets larger than available RAM. - -*Solution:* Use DataPipe for large datasets: - -.. code-block:: MATLAB - - % Don't load huge datasets directly: - % huge_data = load_entire_dataset(); % May exceed memory - % electrical_series = types.core.ElectricalSeries('data', huge_data, ...); - - % Instead, use DataPipe for efficient handling: - data_pipe = types.untyped.DataPipe( ... - 'data', initial_chunk, ... - 'maxSize', [total_samples, num_channels], ... - 'compressionLevel', 6); - - electrical_series = types.core.ElectricalSeries('data', data_pipe, ...); - -File Corruption Issues ----------------------- - -**Symptoms of Corrupted Files:** - -- File cannot be opened by nwbRead -- Incomplete data when reading -- Error messages about invalid HDF5 structure -- File size is much smaller than expected - -**Prevention:** - -.. code-block:: MATLAB - - function safe_nwb_export(nwb, filename) - temp_filename = [filename, '.tmp']; - - try - % Export to temporary file first - nwbExport(nwb, temp_filename); - - % Verify the file can be read - test_nwb = nwbRead(temp_filename); - clear test_nwb; % Release file handle - - % If successful, move to final location - if exist(filename, 'file') - backup_filename = [filename, '.backup']; - movefile(filename, backup_filename); - end - movefile(temp_filename, filename); - - fprintf('File exported successfully: %s\n', filename); - - catch ME - % Clean up on failure - if exist(temp_filename, 'file') - delete(temp_filename); - end - - fprintf('Export failed: %s\n', ME.message); - rethrow(ME); - end - end - -**Recovery from Corruption:** - -.. code-block:: MATLAB - - function recovered_data = recover_from_corrupted_nwb(corrupted_file) - try - % Try to read whatever is accessible - nwb = nwbRead(corrupted_file); - - % Extract data that's still readable - recovered_data = struct(); - - % Try to recover metadata - try - recovered_data.session_start_time = nwb.session_start_time; - recovered_data.identifier = nwb.identifier; - recovered_data.session_description = nwb.session_description; - catch - warning('Could not recover basic metadata'); - end - - % Try to recover acquisition data - try - acquisition_keys = nwb.acquisition.keys(); - for key = acquisition_keys - try - data_obj = nwb.acquisition.get(key{1}); - recovered_data.acquisition.(key{1}) = data_obj; - catch - warning('Could not recover acquisition data: %s', key{1}); - end - end - catch - warning('Could not access acquisition data'); - end - - catch ME - error('File is too corrupted to recover: %s', ME.message); - end - end - -Performance Problems --------------------- - -**File Creation Takes Too Long:** - -*Symptoms:* Export process runs for hours or appears to hang. - -*Causes and Solutions:* - -1. **Large uncompressed datasets:** - -.. code-block:: MATLAB - - % Problem: No compression - data_pipe = types.untyped.DataPipe('data', large_data); - - % Solution: Add compression - data_pipe = types.untyped.DataPipe( ... - 'data', large_data, ... - 'compressionLevel', 6); - -2. **Poor chunking strategy:** - -.. code-block:: MATLAB - - % Problem: Inappropriate chunk size - data_pipe = types.untyped.DataPipe( ... - 'chunkSize', [1, num_channels]); % Too small chunks - - % Solution: Better chunk size - data_pipe = types.untyped.DataPipe( ... - 'chunkSize', [1000, num_channels]); % Larger, more efficient chunks - -3. **Excessive memory allocation:** - -.. code-block:: MATLAB - - % Problem: Loading all data at once - all_data = load_entire_experiment(); - - % Solution: Process in chunks - chunk_size = 30000; % 1 second at 30kHz - for chunk_start = 1:chunk_size:total_samples - chunk_end = min(chunk_start + chunk_size - 1, total_samples); - chunk_data = load_data_chunk(chunk_start, chunk_end); - append_to_nwb(nwb, chunk_data); - end - -**Files Are Too Large:** - -*Problem:* NWB files much larger than source data. - -*Solutions:* - -1. **Increase compression:** - -.. code-block:: MATLAB - - % Try higher compression levels - data_pipe = types.untyped.DataPipe( ... - 'compressionLevel', 9); % Maximum compression - -2. **Use appropriate data types:** - -.. code-block:: MATLAB - - % Convert to smaller data types if possible - if max(data(:)) < 32767 && min(data(:)) > -32768 - compressed_data = int16(data); % Use 16-bit instead of 64-bit - end - -3. **Remove unnecessary precision:** - -.. code-block:: MATLAB - - % Round data to remove artificial precision - rounded_data = round(data * 100) / 100; % Keep 2 decimal places - -Schema and Structure Issues ---------------------------- - -**"Invalid schema" Errors:** - -*Problem:* Data doesn't match expected NWB structure. - -*Common causes:* - -1. **Incorrect data dimensions:** - -.. code-block:: MATLAB - - % Problem: Wrong dimension order - electrical_series = types.core.ElectricalSeries( ... - 'data', data); % data should be [time x channels], not [channels x time] - - % Solution: Transpose if necessary - if size(data, 1) < size(data, 2) % More channels than timepoints is suspicious - data = data'; % Transpose to [time x channels] - end - -2. **Missing linked objects:** - -.. code-block:: MATLAB - - % Problem: Reference to non-existent object - electrical_series = types.core.ElectricalSeries( ... - 'electrodes', electrode_region, ... % electrode_region not properly created - 'data', data); - - % Solution: Ensure all linked objects exist - electrode_table = create_electrode_table(electrode_info); - electrode_region = types.hdmf_common.DynamicTableRegion( ... - 'table', types.untyped.ObjectView(electrode_table), ... - 'data', electrode_indices); - -**Inconsistent Units or Timestamps:** - -.. code-block:: MATLAB - - function validate_temporal_consistency(nwb) - % Check that all time series use consistent time base - - timeseries_objects = find_all_timeseries(nwb); - reference_time = nwb.timestamps_reference_time; - - for ts = timeseries_objects - if ~isempty(ts.starting_time) - % Check starting time is reasonable - if ts.starting_time < 0 - warning('Negative starting time detected: %.3f', ts.starting_time); - end - end - - if ~isempty(ts.timestamps) - % Check timestamp consistency - timestamps = ts.timestamps.load(); - if any(diff(timestamps) <= 0) - warning('Non-monotonic timestamps detected'); - end - end - end - end - -Data Type and Format Issues ---------------------------- - -**Complex Number Handling:** - -.. code-block:: text - - Error: Complex data types not supported in NWB files - -*Problem:* Trying to store complex-valued data directly. - -*Solution:* Split into real and imaginary parts: - -.. code-block:: MATLAB - - % Problem: Complex data - % complex_data = fft(signal); % Results in complex numbers - - % Solution: Store real and imaginary separately - fft_result = fft(signal); - real_part = real(fft_result); - imag_part = imag(fft_result); - - % Store as separate time series - nwb.processing.get('spectral_analysis').nwbdatainterface.set('fft_real', ... - create_timeseries(real_part, 'Real part of FFT')); - nwb.processing.get('spectral_analysis').nwbdatainterface.set('fft_imag', ... - create_timeseries(imag_part, 'Imaginary part of FFT')); - -**String and Text Data:** - -.. code-block:: MATLAB - - % Ensure text data is properly formatted - if iscell(text_data) - % Convert cell array to character array if needed - text_data = char(text_data); - end - - % Handle special characters - text_data = strrep(text_data, char(0), ''); % Remove null characters - -Debugging Workflow ------------------- - -**Step-by-Step Debugging:** - -1. **Test with minimal data:** - -.. code-block:: MATLAB - - function debug_nwb_creation() - % Start with absolute minimum - nwb = NwbFile( ... - 'session_start_time', datetime('now', 'TimeZone', 'local'), ... - 'identifier', 'debug_test', ... - 'session_description', 'Debugging test'); - - % Export and test - nwbExport(nwb, 'debug_minimal.nwb'); - test_nwb = nwbRead('debug_minimal.nwb'); - - % Add components one by one - nwb.acquisition.set('test_data', create_minimal_timeseries()); - nwbExport(nwb, 'debug_with_data.nwb'); - - % Continue adding complexity until error occurs - end - -2. **Use verbose error reporting:** - -.. code-block:: MATLAB - - try - nwbExport(nwb, filename); - catch ME - fprintf('Error during export:\n'); - fprintf('Message: %s\n', ME.message); - fprintf('Stack trace:\n'); - for i = 1:length(ME.stack) - fprintf(' %s (line %d)\n', ME.stack(i).name, ME.stack(i).line); - end - - % Try to get more specific information - if contains(ME.message, 'HDF5') - fprintf('This appears to be an HDF5-related error\n'); - fprintf('Consider checking data types and file permissions\n'); - end - end - -**Diagnostic Tools:** - -.. code-block:: MATLAB - - function diagnose_nwb_problems(nwb) - % Comprehensive diagnostic function - - fprintf('=== NWB Diagnostic Report ===\n'); - - % Check required fields - required_fields = {'session_start_time', 'identifier', 'session_description'}; - for field = required_fields - if isempty(nwb.(field{1})) - fprintf('ERROR: Required field %s is empty\n', field{1}); - else - fprintf('OK: %s = %s\n', field{1}, string(nwb.(field{1}))); - end - end - - % Check data sizes - acquisition_keys = nwb.acquisition.keys(); - for key = acquisition_keys - data_obj = nwb.acquisition.get(key{1}); - if isprop(data_obj, 'data') - data_size = size(data_obj.data); - fprintf('Data object %s: size = [%s]\n', key{1}, ... - strjoin(string(data_size), ' x ')); - - % Check for suspicious sizes - if any(data_size == 0) - fprintf('WARNING: Zero-sized dimension in %s\n', key{1}); - end - end - end - - % Memory usage estimate - memory_estimate = estimate_nwb_memory_usage(nwb); - fprintf('Estimated memory usage: %.2f MB\n', memory_estimate / 1e6); - end - -Getting Help ------------- - -**When to Seek Help:** - -- Error messages that aren't covered in this guide -- Performance issues that persist after optimization -- File corruption that can't be recovered -- Schema validation errors with unclear causes - -**Where to Get Help:** - -1. **MatNWB GitHub Issues:** https://github.com/NeurodataWithoutBorders/matnwb/issues -2. **NWB Community Forum:** https://community.nwb.org/ -3. **NWB Documentation:** https://nwb-overview.readthedocs.io/ - -**Information to Include When Reporting Issues:** - -.. code-block:: MATLAB - - function create_bug_report() - % Gather diagnostic information for bug reports - - fprintf('=== Bug Report Information ===\n'); - fprintf('MATLAB Version: %s\n', version); - fprintf('Operating System: %s\n', computer); - fprintf('MatNWB Version: %s\n', get_matnwb_version()); - - % Memory information - if ispc - [~, mem_info] = system('wmic computersystem get TotalPhysicalMemory /value'); - else - [~, mem_info] = system('free -h'); - end - fprintf('Memory Info: %s\n', mem_info); - - % Recent errors - fprintf('Recent errors in command window:\n'); - % Include error messages and stack traces - - fprintf('Data characteristics:\n'); - fprintf(' - Dataset sizes: [describe your data dimensions]\n'); - fprintf(' - Data types: [list data types you are using]\n'); - fprintf(' - Processing workflow: [describe your workflow]\n'); - end - -This troubleshooting guide should help you resolve most common issues. Remember that many problems can be prevented by following the best practices outlined in previous sections, particularly around HDF5 limitations and performance optimization. From 80b0a2816ea7d8586f4fdcd32981dd27e196cbce Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 4 Sep 2025 19:09:13 +0200 Subject: [PATCH 12/67] Minor rewording --- docs/source/index.rst | 14 +++++++++----- .../file_create/performance_optimization.rst | 6 +++--- docs/source/pages/concepts/file_read/nwbfile.rst | 4 +++- docs/source/pages/getting_started/overview.rst | 2 +- docs/source/pages/getting_started/quickstart.rst | 8 ++++---- 5 files changed, 20 insertions(+), 14 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index 773c1ce12..ec259e886 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -5,7 +5,7 @@ NWB for MATLAB ############## MatNWB_ is a MATLAB package for working with |NWB|_ (NWB) files. -It provides a high‑level, efficient interface for reading and writing neurophysiology data in the NWB format and includes tutorial Live Scripts that show you how to read existing NWB files or convert your own data to NWB. +It provides a high‑level, efficient interface for reading and writing neurophysiology data in the NWB format and includes tutorial Live Scripts that show you how to read NWB files or convert your own data to NWB. This documentation focuses on MatNWB. If you are new to NWB or want to learn more about the format itself, these resources are a great starting point: @@ -32,11 +32,15 @@ more of the domain-focused tutorials: - :ref:`ophys-tutorial` To explore the growing world of open-source neuroscience data stored in the -NWB format, check out the :ref:`Read from Dandihub` how-to-guide. +NWB format, check out the :ref:`Read from Dandihub` tutorial. -This documentation is based on the `diataxis `_ framework. -When you browse the table of contents below, look for tutorials, how-to-guides, -concepts (explanation) and reference sections to help orient yourself. +.. + This documentation is based on the `diataxis `_ framework. + When you browse the table of contents below, look for tutorials, how-to-guides, + concepts (explanation) and reference sections to help orient yourself. + + +Looking for a specific topic which has not been mentioned? Check out the full table of contents below: .. toctree:: :maxdepth: 1 diff --git a/docs/source/pages/concepts/file_create/performance_optimization.rst b/docs/source/pages/concepts/file_create/performance_optimization.rst index 56e940954..983b8a568 100644 --- a/docs/source/pages/concepts/file_create/performance_optimization.rst +++ b/docs/source/pages/concepts/file_create/performance_optimization.rst @@ -8,10 +8,12 @@ Understanding DataPipe The :class:`types.untyped.DataPipe` class is the key to efficient data handling in MatNWB. It provides: -- **Lazy loading** - Data isn't loaded into memory until needed - **Compression** - Reduces file size significantly - **Chunking** - Optimizes access patterns +- **Pre-allocation** - Reserve space for datasets that will grow over time - **Iterative writing** - Enables processing datasets larger than RAM +- **Lazy loading** - Data isn't loaded into memory until needed + Basic DataPipe Usage ~~~~~~~~~~~~~~~~~~~~ @@ -395,5 +397,3 @@ Best Practices Summary % Validate performance validate_file_performance(output_file); end - -The next section covers best practices that tie together all these performance considerations with robust file creation workflows. diff --git a/docs/source/pages/concepts/file_read/nwbfile.rst b/docs/source/pages/concepts/file_read/nwbfile.rst index 671410ec5..633d59c76 100644 --- a/docs/source/pages/concepts/file_read/nwbfile.rst +++ b/docs/source/pages/concepts/file_read/nwbfile.rst @@ -78,8 +78,10 @@ This object contains properties that represent the contents of the NWB file, inc For an overview of the NWB file structure, see the `NWB File Structure `_ section of the `NWB Documentation `_, or for technical details, refer to the `NWB Format Specification `_. -One key difference between the :class:`NwbFile` object and the formal NWB structure is that some top-level groups, like ``general``, ``intervals`` and ``stimulus`` are flattened into top level properties of the :class:`NwbFile` object. This is only a convenience for easier access, and does not change the underlying structure of the NWB file. +.. note:: + One key difference between the :class:`NwbFile` object and the NWB schema is that some top-level groups (e.g. ``general``, ``intervals`` and ``stimulus``) and their subgroups are flattened into top level properties of the :class:`NwbFile` object. This flattening enables easier upfront property access in the MATLAB API, but does not change the on‑disk layout of the NWB file. + Basic Navigation ---------------- diff --git a/docs/source/pages/getting_started/overview.rst b/docs/source/pages/getting_started/overview.rst index f4b8387b5..9a5218c44 100644 --- a/docs/source/pages/getting_started/overview.rst +++ b/docs/source/pages/getting_started/overview.rst @@ -79,7 +79,7 @@ Common questions you may encounter (and where to find answers) - How do I add lab‑specific data? - - Use :doc:`Neurodata Extensions ` to install published NDX or to generate classes from your own namespace. + - See :doc:`Neurodata Extensions ` for guides to install published NDX or to generate classes from your own namespace specification. Important considerations when working with MatNWB: diff --git a/docs/source/pages/getting_started/quickstart.rst b/docs/source/pages/getting_started/quickstart.rst index fc43ae109..d1190042b 100644 --- a/docs/source/pages/getting_started/quickstart.rst +++ b/docs/source/pages/getting_started/quickstart.rst @@ -37,7 +37,7 @@ An NWB file always needs three required fields: Step 2 — Add a TimeSeries ------------------------- -We’ll add a short synthetic signal sampled at 10 Hz for 1 second. +We’ll add a short synthetic signal sampled at 10 Hz for 1 second using the :class:`types.core.TimeSeries` neurodata type. .. code-block:: matlab @@ -53,9 +53,9 @@ We’ll add a short synthetic signal sampled at 10 Hz for 1 second. nwb.acquisition.set('DemoSignal', ts); .. note:: - MatNWB uses MATLAB array ordering when writing to HDF5. For multi-dimensional time series, the time dimension should be the last dimension in MATLAB. See :doc:`/pages/concepts/considerations for details. - - + MatNWB uses MATLAB array ordering when writing to HDF5. For multi-dimensional time series, the time dimension should be the last dimension of the MATLAB array. See the :doc:`Data Dimensions ` section in the "MatNWB important considerations" page. + + Step 3 — Write the File ----------------------- From 9a4a24b276d6c51cdf1028d40a4d51ca8af2d712 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 4 Sep 2025 22:05:08 +0200 Subject: [PATCH 13/67] Update installation.rst Fixed uninstall instruction --- docs/source/pages/getting_started/installation.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/source/pages/getting_started/installation.rst b/docs/source/pages/getting_started/installation.rst index d72300e85..66205de4c 100644 --- a/docs/source/pages/getting_started/installation.rst +++ b/docs/source/pages/getting_started/installation.rst @@ -123,12 +123,13 @@ Update or uninstall cd path/to/matnwb !git pull -- Uninstall (Remove the MatNWB folder and remove it from the MATLAB path): +- Uninstall (Remove the MatNWB folder from the MATLAB path and delete it): .. code-block:: matlab rmpath("path/to/matnwb") - savepath() % optional + savepath() + rmdir("path/to/matnwb", "s") % delete folder and contents Troubleshooting From 843e9d5bfc5bb143d072b30a77bf3d058166f1f1 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 4 Sep 2025 22:05:26 +0200 Subject: [PATCH 14/67] Update quickstart.rst Add small introduction --- docs/source/pages/getting_started/quickstart.rst | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/docs/source/pages/getting_started/quickstart.rst b/docs/source/pages/getting_started/quickstart.rst index d1190042b..754120223 100644 --- a/docs/source/pages/getting_started/quickstart.rst +++ b/docs/source/pages/getting_started/quickstart.rst @@ -17,6 +17,16 @@ Prerequisites - MatNWB :ref:`installed` and added to your MATLAB path +Introduction +------------ + +Creating an NWB file involves three main steps: + +1. **Create an NwbFile object** with required metadata +2. **Add neurodata types** (time series, processed data, etc.) +3. **Export the file** using the :func:`nwbExport` function + + Step 1 — Create a minimal NWB file ---------------------------------- @@ -65,7 +75,7 @@ Step 3 — Write the File This writes the NWB file to your current working directory. -Step 4 — Read the File Back +Verify — Read the File Back --------------------------- .. code-block:: matlab From b1c18bfd852e9bc7df9d38d48ee1a0eb3f50aec5 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 4 Sep 2025 22:06:08 +0200 Subject: [PATCH 15/67] Update file_create.rst Make the file creation overview page more of an explanation page --- docs/source/pages/concepts/file_create.rst | 74 ++++++++-------------- 1 file changed, 26 insertions(+), 48 deletions(-) diff --git a/docs/source/pages/concepts/file_create.rst b/docs/source/pages/concepts/file_create.rst index 367c5b588..20564a4aa 100644 --- a/docs/source/pages/concepts/file_create.rst +++ b/docs/source/pages/concepts/file_create.rst @@ -1,60 +1,38 @@ Creating NWB Files ================== -This section provides a guide to creating NWB (Neurodata Without Borders) files with MatNWB. It covers the fundamental concepts, step-by-step workflow, and important considerations when building NWB files from scratch. For detailed code examples and usage demonstrations, please refer to the :doc:`tutorials <../tutorials/index>`. - -Creating an NWB file involves three main steps: - -1. **Create an NwbFile object** with required metadata -2. **Add neurodata types** (time series, processed data, etc.) -3. **Export the file** using the :func:`nwbExport` function - -**Example:** - -.. code-block:: MATLAB - - % Step 1: Create NwbFile object - nwb = NwbFile( ... - 'session_start_time', datetime('now', 'TimeZone', 'local'), ... - 'identifier', 'unique_session_id', ... - 'session_description', 'Description of your experiment'); - - % Step 2: Add data (example: time series data) - data = randn(1000, 10); % Example neural data - timeseries = types.core.TimeSeries( ... - 'data', data, ... - 'data_unit', 'volts', ... - 'starting_time', 0.0, ... - 'starting_time_rate', 30000.0); - nwb.acquisition.set('neural_data', timeseries); - - % Step 3: Export to file - nwbExport(nwb, 'my_experiment.nwb'); +When creating an NWB file, you're translating your experimental data and metadata into a structure that follows the NWB schema. MatNWB provides MATLAB classes that represent the different components (neurodata types) of an NWB file, allowing you to build up the file piece by piece. + +To understand the structure of an NWB file, the NWB Overview documentation has a +:nwb-overview:`great introduction `. + +As demonstrated in the quickstart tutorial, you start by creating an :class:`NwbFile` object. .. note:: - After export the file, it is recommended to use the NWBInspector for comprehensive validation of both structural compliance with the NWB schema and compliance of data with NWB best practices. See :func:`inspectNwbFile`. + An "object" is an instance of a class. Objects are similar to MATLAB structs, but with additional functionality. The fields (called properties) are defined by the class, and the class can enforce rules about what values are allowed. This helps ensure that your data conforms to the NWB schema. + +When you create an :class:`NwbFile` object, you get a container whose properties are derived directly from the NWB schema. Some properties are required, others are optional. Some need specific MATLAB types like `char` or `datetime`, while others need specific neurodata types defined in the schema. + +The Assembly Process +-------------------- + +Building an NWB file follows a logical pattern: + +**Data Objects**: You create objects for your data (like :class:`types.core.TimeSeries` for time-based measurements) -When creating an NWB file, it is useful to understand both its structure and the underlying HDF5 format. The :ref:`next section` covers the NwbFile object and its configuration; later sections address data organization, performance, and important caveats about the HDF5 format. +**Container Object**: You add these data objects to your :class:`NwbFile` object in appropriate locations -.. warning:: - **Important HDF5 Limitations** - - NWB files are stored in HDF5 format, which has important limitations: - - - **To modify datasets** after creation - a DataPipe must be configured for the dataset on creation. - - **Datasets should not be deleted** once created - the space will not be reclaimed. - - **Schema consistency** must be maintained throughout the file creation process. +**File Export**: You save everything to disk using :func:`nwbExport`, which translates your objects into NWB/HDF5 format - See :doc:`file_create/hdf5_considerations` for detailed information on working within these constraints. +This approach ensures your data is properly organized and validated before it becomes a file. -**Next steps** +Schema Validation +----------------- -The following pages provide detailed information on specific aspects of creating NWB files: +The NWB schema acts as a blueprint that defines what makes a valid neuroscience data file. When you export your file, MatNWB checks that: -.. toctree:: - :maxdepth: 1 +- All required properties are present +- Data types match what the schema expects +- Relationships between different parts of the file are correct - file_create/nwbfile - file_create/data_organization - file_create/hdf5_considerations - file_create/performance_optimization +If anything is missing or incorrect, you'll get an error message explaining what needs to be fixed. This validation helps ensure your files will work with other NWB tools and can be understood by other researchers. From 3c5c3ed5b888d8debe012e3697264b4ac243316d Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 4 Sep 2025 22:49:29 +0200 Subject: [PATCH 16/67] Update file_create.rst Update rst formatting --- docs/source/pages/concepts/file_create.rst | 25 ++++++++++++++-------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/docs/source/pages/concepts/file_create.rst b/docs/source/pages/concepts/file_create.rst index 20564a4aa..7e3e7c739 100644 --- a/docs/source/pages/concepts/file_create.rst +++ b/docs/source/pages/concepts/file_create.rst @@ -3,26 +3,25 @@ Creating NWB Files When creating an NWB file, you're translating your experimental data and metadata into a structure that follows the NWB schema. MatNWB provides MATLAB classes that represent the different components (neurodata types) of an NWB file, allowing you to build up the file piece by piece. -To understand the structure of an NWB file, the NWB Overview documentation has a -:nwb-overview:`great introduction `. +.. tip:: + To understand the general structure of an NWB file, the NWB Overview documentation has a + :nwb_overview:`great introduction `. -As demonstrated in the quickstart tutorial, you start by creating an :class:`NwbFile` object. +As demonstrated in the :doc:`Quickstart ` tutorial, when creating an NWB file, you start by invoking the :class:`NwbFile` class. This will return an :class:`NwbFile` object, a container whose properties are derived directly from the NWB schema. Some properties are required, others are optional. Some need specific MATLAB types like ``char`` or ``datetime``, while others need specific neurodata types defined in the NWB schema. .. note:: - An "object" is an instance of a class. Objects are similar to MATLAB structs, but with additional functionality. The fields (called properties) are defined by the class, and the class can enforce rules about what values are allowed. This helps ensure that your data conforms to the NWB schema. - -When you create an :class:`NwbFile` object, you get a container whose properties are derived directly from the NWB schema. Some properties are required, others are optional. Some need specific MATLAB types like `char` or `datetime`, while others need specific neurodata types defined in the schema. + An "object" is an instance of a class. Objects are similar to MATLAB structs, but with additional functionality. The fields (called properties) are defined by the class definition (a .m file), and the class can enforce rules about what values are allowed. This helps ensure that your data conforms to the NWB schema. The Assembly Process -------------------- Building an NWB file follows a logical pattern: -**Data Objects**: You create objects for your data (like :class:`types.core.TimeSeries` for time-based measurements) +- **Create neurodata objects**: You create objects for your data (like :class:`types.core.TimeSeries` for time-based measurements) -**Container Object**: You add these data objects to your :class:`NwbFile` object in appropriate locations +- **Add to containers**: You add these data objects to your :class:`NwbFile` object (or other objects) in appropriate locations -**File Export**: You save everything to disk using :func:`nwbExport`, which translates your objects into NWB/HDF5 format +- **File export**: You save everything to disk using :func:`nwbExport`, which translates your objects into NWB/HDF5 format This approach ensures your data is properly organized and validated before it becomes a file. @@ -36,3 +35,11 @@ The NWB schema acts as a blueprint that defines what makes a valid neuroscience - Relationships between different parts of the file are correct If anything is missing or incorrect, you'll get an error message explaining what needs to be fixed. This validation helps ensure your files will work with other NWB tools and can be understood by other researchers. + +.. toctree:: + :maxdepth: 1 + + file_create/nwbfile + file_create/data_organization + file_create/hdf5_considerations + file_create/performance_optimization From a817423a9dec1f9b85396e9d5c02a352933c1d21 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 4 Sep 2025 22:50:21 +0200 Subject: [PATCH 17/67] Update nwbfile.rst Make it into an explanation page --- .../pages/concepts/file_create/nwbfile.rst | 170 ++++++------------ 1 file changed, 54 insertions(+), 116 deletions(-) diff --git a/docs/source/pages/concepts/file_create/nwbfile.rst b/docs/source/pages/concepts/file_create/nwbfile.rst index be5ba613f..a6bccaee3 100644 --- a/docs/source/pages/concepts/file_create/nwbfile.rst +++ b/docs/source/pages/concepts/file_create/nwbfile.rst @@ -1,149 +1,87 @@ .. _matnwb-create-nwbfile-intro: -Creating the NwbFile Object -=========================== +Understanding the NwbFile Class +=============================== -The :class:`NwbFile` object is the root container for all data in an NWB file. Before adding any experimental data, you must create this object and add the required metadata properties. +The :class:`NwbFile` class in MatNWB is your main interface for creating NWB files. This MATLAB object serves as the root container that holds all your experimental data and metadata, translating between MATLAB's data structures and the NWB format. -Required properties -------------------- +How the NwbFile Object Works +---------------------------- -The NWB file must contain three required properties that needs to be manually specified: +When you create an :class:`NwbFile` object, you're creating a MATLAB representation of what will eventually become an HDF5-based NWB file. The object: -1. **session_start_time** (:class:`datetime`) - When the experiment began, with timezone information -2. **identifier** (:class:`char`) - A unique identifier for this specific session/file -3. **session_description** (:class:`char`) - Brief description of the experimental session +- **Validates input** - ensures your data matches NWB schema requirements +- **Organizes content** - provides a structured way to add different types of data +- **Manages relationships** - maintains connections between related data elements +- **Handles export** - converts everything to proper NWB format when saved -**Example:** +Required Properties in MatNWB +----------------------------- -.. code-block:: MATLAB +MatNWB enforces three required properties that must be present when exporting an :class:`NwbFile` object: - nwb = NwbFile( ... - 'session_start_time', datetime('2024-01-15 09:30:00', 'TimeZone', 'local'), ... - 'identifier', 'Mouse001_Session_20240115', ... - 'session_description', 'Two-photon calcium imaging during whisker stimulation'); +- **session_start_time** (:class:`datetime`) - + The time when your experiment began. MatNWB requires this as a MATLAB ``datetime`` object with timezone information. -Two additional required properties are set automatically if not provided: +- **identifier** (:class:`char` or :class:`string`) - + A unique identifier for this specific session/file. This should be unique across all your NWB files. -- **file_create_date** - Automatically set to the current time when the file is exported -- **timestamps_reference_time** - Defaults to match ``session_start_time`` if not explicitly set +- **session_description** (:class:`char` or :class:`string`) - + A brief description of what happened in this experimental session. -Recommended Metadata Properties -------------------------------- +MatNWB will allow you to create the object without these properties for you to add them later, but they must be set before exporting the file. -While not required, these properties provide important context for your data: -- **general_experimenter** - Who conducted the experiment -- **general_institution** - Where the experiment was performed -- **general_lab** - Which laboratory/group -- **general_session_id** - Lab-specific session identifier -- **general_experiment_description** - Detailed experimental context +Automatic Properties +-------------------- -**Example:** +MatNWB automatically handles some required NWB properties so you don't have to: -.. code-block:: MATLAB +- **file_create_date** - + Set automatically when you export the file using :func:`nwbExport` - nwb = NwbFile( ... - 'session_start_time', datetime('2024-01-15 09:30:00', 'TimeZone', 'local'), ... - 'identifier', 'Mouse001_Session_20240115', ... - 'session_description', 'Two-photon calcium imaging during whisker stimulation', ... - 'general_experimenter', 'Dr. Jane Smith', ... - 'general_institution', 'University Research Institute', ... - 'general_lab', 'Neural Circuits Lab', ... - 'general_session_id', 'session_001', ... - 'general_experiment_description', 'Investigation of sensory processing in barrel cortex'); +- **timestamps_reference_time** - + Defaults to match your ``session_start_time`` if not explicitly set +Object Structure and Organization +--------------------------------- -Subject Information -------------------- +The :class:`NwbFile` object provides specific properties for organizing different types of data: -Information about the experimental subject should be added using the :class:`types.core.Subject` class: +- **acquisition** - + Raw data as it comes from your instruments (e.g., voltage recordings, behavioral videos) -.. code-block:: MATLAB +- **processing** - + Processed or analyzed data, organized into processing modules - % Create subject information - subject = types.core.Subject( ... - 'subject_id', 'Mouse001', ... - 'age', 'P90', ... % Post-natal day 90 - 'description', 'C57BL/6J mouse', ... - 'species', 'Mus musculus', ... - 'sex', 'M'); - - % Add to NWB file - nwb.general_subject = subject; +- **analysis** - + Results of analysis, like trial averages or population statistics -Best Practices for Identifiers ------------------------------- - -**Session Identifiers:** - -Choose identifiers that are: - -- **Unique across your entire dataset** - avoid conflicts between labs, experiments, etc. -- **Informative** - include subject, date, session number when helpful -- **Consistent** - use a standardized naming scheme - -.. code-block:: MATLAB - - % Good examples: - identifier = 'SmithLab_Mouse001_20240115_Session01'; - identifier = 'MD5HASH_a1b2c3d4e5f6'; % For anonymization - identifier = sprintf('%s_%s_%s', lab_id, subject_id, datestr(now, 'yyyymmdd')); - -**Session Descriptions:** +- **general_subject** - + Information about the experimental subject (requires a :class:`types.core.Subject` object) -Be specific and include: +**Additional metadata properties** + Various ``general_*`` properties for experimenter, institution, lab, etc. -- **Experimental paradigm** - what task or stimulation was used -- **Recording method** - electrophysiology, imaging, behavior only, etc. -- **Key experimental variables** - drug conditions, genotypes, etc. -.. code-block:: MATLAB - - % Good examples: - session_description = 'Extracellular recordings in primary visual cortex during oriented grating presentation'; - session_description = 'Two-photon calcium imaging of layer 2/3 pyramidal neurons during whisker deflection'; - session_description = 'Behavioral training on auditory discrimination task, no neural recordings'; - -Time Zone Considerations ------------------------- - -NWB files store all timestamps in a standardized format. Always specify the timezone when creating datetime objects: - -.. code-block:: MATLAB - - % Specify local timezone - session_start = datetime('2024-01-15 09:30:00', 'TimeZone', 'America/New_York'); - - % Or use UTC if preferred - session_start = datetime('2024-01-15 14:30:00', 'TimeZone', 'UTC'); - - % Current time with local timezone - session_start = datetime('now', 'TimeZone', 'local'); - -The ``timestamps_reference_time`` field defines "time zero" for all timestamps in the file. This is typically set to match ``session_start_time``, but can be different if needed for your experimental design. +Working with MATLAB Data Types +------------------------------ -See also the :nwbinspector:`Best Practices ` section of the NWB Inspector documentation for details on setting the ``session_start_time``. +The :class:`NwbFile` object is designed to work naturally with MATLAB data types: -Validation ----------- +- **Datetime handling**: Uses MATLAB's ``datetime`` class with timezone support +- **String/char compatibility**: Accepts both ``char`` arrays and ``string`` objects +- **Numeric arrays**: Works with standard MATLAB matrices and arrays +- **Cell arrays**: Can handle MATLAB cell arrays for text data -The NwbFile and (included datatypes) will be validated when you attempt to export to file using the :func:`nwbExport` function. If any required properties are missing, an error will be raised. +MatNWB automatically converts these MATLAB types to appropriate NWB format during export. -.. code-block:: MATLAB +Validation and Error Handling +----------------------------- - % This will fail - missing required properties - nwb = NwbFile(); - nwbExport(nwb, 'test.nwb'); % Error: missing identifier, session_description, etc. - - % This will succeed - nwb = NwbFile( ... - 'session_start_time', datetime('now', 'TimeZone', 'local'), ... - 'identifier', 'test_session', ... - 'session_description', 'Test file'); - nwbExport(nwb, 'test.nwb'); % Success +MatNWB validates your :class:`NwbFile` object at different points: -Next Steps ----------- +1. **Property assignment**: Data types and shapes are checked when you create objects or set properties +2. **File export**: Required properties and complete schema validation -Once you have created an NwbFile object, you can begin adding experimental data. The next section covers how to organize different types of data within the NWB structure. +If validation fails, you'll get specific error messages explaining what needs to be fixed. This helps catch problems early rather than discovering them when trying to share or reuse your data. From 6879b2afc16288e5774a65d18cc48bc92b346a52 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 4 Sep 2025 23:15:36 +0200 Subject: [PATCH 18/67] Add neurodata types page --- docs/source/pages/concepts/file_create.rst | 15 +- .../file_create/data_organization.rst | 289 ------------------ .../concepts/file_create/neurodata_types.rst | 115 +++++++ 3 files changed, 122 insertions(+), 297 deletions(-) delete mode 100644 docs/source/pages/concepts/file_create/data_organization.rst create mode 100644 docs/source/pages/concepts/file_create/neurodata_types.rst diff --git a/docs/source/pages/concepts/file_create.rst b/docs/source/pages/concepts/file_create.rst index 7e3e7c739..13065801c 100644 --- a/docs/source/pages/concepts/file_create.rst +++ b/docs/source/pages/concepts/file_create.rst @@ -12,8 +12,7 @@ As demonstrated in the :doc:`Quickstart ` tut .. note:: An "object" is an instance of a class. Objects are similar to MATLAB structs, but with additional functionality. The fields (called properties) are defined by the class definition (a .m file), and the class can enforce rules about what values are allowed. This helps ensure that your data conforms to the NWB schema. -The Assembly Process --------------------- +**The Assembly Process** Building an NWB file follows a logical pattern: @@ -25,8 +24,7 @@ Building an NWB file follows a logical pattern: This approach ensures your data is properly organized and validated before it becomes a file. -Schema Validation ------------------ +**Schema Validation** The NWB schema acts as a blueprint that defines what makes a valid neuroscience data file. When you export your file, MatNWB checks that: @@ -38,8 +36,9 @@ If anything is missing or incorrect, you'll get an error message explaining what .. toctree:: :maxdepth: 1 + :titlesonly: - file_create/nwbfile - file_create/data_organization - file_create/hdf5_considerations - file_create/performance_optimization + Understanding the NwbFile Object + Understanding Neurodata Types + HDF5 Considerations + Performance Optimization diff --git a/docs/source/pages/concepts/file_create/data_organization.rst b/docs/source/pages/concepts/file_create/data_organization.rst deleted file mode 100644 index dfdb852ee..000000000 --- a/docs/source/pages/concepts/file_create/data_organization.rst +++ /dev/null @@ -1,289 +0,0 @@ -Data Organization in NWB Files -============================== - -Once you have created an :class:`NwbFile` object, the next step is adding your experimental data using appropriate NWB data types. The NWB format provides a standardized structure for different types of neuroscience data. - -Data Organization Hierarchy ---------------------------- - -NWB files organize data into several main categories: - -- **acquisition** - Raw, unprocessed data from the experiment -- **processing** - Processed/analyzed data, organized by processing modules -- **stimulus** - Information about experimental stimuli -- **analysis** - Custom analysis results -- **scratch** - Temporary storage during analysis - -.. code-block:: MATLAB - - % Example of the basic structure - nwb.acquisition.set('RawEphys', electrical_series); - nwb.processing.set('EphysModule', processing_module); - nwb.stimulus_presentation.set('VisualStimulus', image_series); - -Adding Data with the .set Method ---------------------------------- - -NWB data containers (like ``acquisition``, ``processing``, etc.) use the ``.set`` method to add data objects. This method requires two arguments: - -1. **Name** (string) - A unique identifier for the data object within that container -2. **Data Object** - The NWB data type being added (e.g., TimeSeries, ProcessingModule) - -.. code-block:: MATLAB - - % The .set method syntax: - nwb.acquisition.set('DataName', data_object); - - % Why .set is used instead of direct assignment: - % This allows NWB to maintain internal structure and validate data types - -**Naming Conventions:** - -Use valid MATLAB identifiers with PascalCase for consistency: - -.. code-block:: MATLAB - - % Good naming examples (PascalCase, descriptive): - nwb.acquisition.set('RawElectricalSeries', electrical_series); - nwb.acquisition.set('CalciumImagingData', two_photon_series); - nwb.acquisition.set('BehaviorVideo', image_series); - - % Avoid these naming patterns: - nwb.acquisition.set('data1', electrical_series); % Not descriptive - nwb.acquisition.set('raw-ephys', electrical_series); % Invalid MATLAB identifier - nwb.acquisition.set('raw_ephys_data', electrical_series); % Use PascalCase instead - -- **Use PascalCase** - capitalize the first letter of each word -- **Be descriptive** - names should indicate the data content and type -- **Avoid special characters** - stick to letters, numbers, and underscores if needed -- **Use valid MATLAB identifiers** - names that could be valid variable names -- **Be consistent** - establish and follow naming patterns within your lab/project - -Refer to the :nwbinspector:`Naming Conventions ` section of the NWB Inspector docs for more details. - - -Time Series Data ----------------- - -Most neural data is time-varying and should use :class:`TimeSeries` objects or their specialized subclasses: - -**Basic TimeSeries:** - -.. code-block:: MATLAB - - % Generic time series data - data = randn(5, 1000); % 5 channels, 1000 time points - - ts = types.core.TimeSeries( ... - 'data', data, ... - 'data_unit', 'arbitrary_units', ... - 'starting_time', 0.0, ... - 'starting_time_rate', 1000.0, ... % 1kHz sampling rate - 'description', 'Raw neural signal'); - - nwb.acquisition.set('RawSignal', ts); - -**Electrophysiology Data:** - -For extracellular recordings, use :class:`ElectricalSeries`: - -.. code-block:: MATLAB - - % Create electrode table (describes recording channels) - electrode_table = util.createElectrodeTable(nwb, electrode_info); - - % Create reference to specific electrodes - electrode_region = types.hdmf_common.DynamicTableRegion( ... - 'table', types.untyped.ObjectView(electrode_table), ... - 'description', 'recording electrodes', ... - 'data', [0, 1, 2, 3]); % Which electrodes were used - - % Raw extracellular data - raw_data = int16(randn(30000, 4) * 1000); % 1 second at 30kHz, 4 channels - - electrical_series = types.core.ElectricalSeries( ... - 'data', raw_data, ... - 'data_unit', 'microvolts', ... - 'electrodes', electrode_region, ... - 'starting_time', 0.0, ... - 'starting_time_rate', 30000.0); - - nwb.acquisition.set('RawEphys', electrical_series); - -**Calcium Imaging Data:** - -For optical data, use :class:`TwoPhotonSeries` or :class:`OnePhotonSeries`: - -.. code-block:: MATLAB - - % First define imaging plane - imaging_plane = types.core.ImagingPlane( ... - 'description', 'Primary visual cortex, layer 2/3', ... - 'excitation_lambda', 925.0, ... % Two-photon excitation wavelength - 'imaging_rate', 30.0, ... - 'indicator', 'GCaMP6f', ... - 'location', 'V1'); - - nwb.general_optophysiology.set('ImagingPlane1', imaging_plane); - - % Calcium imaging time series - imaging_data = uint16(randn(50, 50, 1000) * 1000 + 2000); % 50x50 pixels, 1000 frames - - two_photon_series = types.core.TwoPhotonSeries( ... - 'data', imaging_data, ... - 'imaging_plane', types.untyped.SoftLink(imaging_plane), ... - 'starting_time', 0.0, ... - 'starting_time_rate', 30.0, ... - 'data_unit', 'fluorescence'); - - nwb.acquisition.set('CalciumImaging', two_photon_series); - -Processing Modules ------------------- - -Processed data should be organized into processing modules, which group related analyses together: - -.. code-block:: MATLAB - - % Create a processing module for extracellular ephys - ephys_module = types.core.ProcessingModule( ... - 'description', 'Processed extracellular electrophysiology data'); - - % Add LFP data to the module - lfp_data = randn(1000, 4); % Downsampled/filtered data - - lfp_electrical_series = types.core.ElectricalSeries( ... - 'data', lfp_data, ... - 'data_unit', 'microvolts', ... - 'electrodes', electrode_region, ... - 'starting_time', 0.0, ... - 'starting_time_rate', 1000.0); % 1kHz for LFP - - lfp = types.core.LFP(); - lfp.electricalseries.set('LFP', lfp_electrical_series); - - ephys_module.nwbdatainterface.set('LFP', lfp); - nwb.processing.set('Ecephys', ephys_module); - -Spike Data and Units --------------------- - -Spike times and sorted units use the specialized :class:`Units` table: - -.. code-block:: MATLAB - - % Create a Units table for spike data - units_table = types.core.Units( ... - 'colnames', {'spike_times'}, ... - 'description', 'Sorted single units'); - - % Add spike times for each unit - unit1_spikes = [0.1, 0.5, 1.2, 1.8, 2.3]; % Spike times in seconds - unit2_spikes = [0.3, 0.9, 1.5, 2.1, 2.7]; - - units_table.addRow('spike_times', unit1_spikes); - units_table.addRow('spike_times', unit2_spikes); - - nwb.units = units_table; - -Behavioral Data ---------------- - -Behavioral measurements can be stored as :class:`TimeSeries` or in specialized containers: - -.. code-block:: MATLAB - - % Position tracking - position_data = randn(1000, 2); % X, Y coordinates over time - - spatial_series = types.core.SpatialSeries( ... - 'data', position_data, ... - 'reference_frame', 'Arena coordinates (cm)', ... - 'data_unit', 'cm', ... - 'starting_time', 0.0, ... - 'starting_time_rate', 60.0); % 60 Hz tracking - - position = types.core.Position(); - position.spatialseries.set('Position', spatial_series); - - % Add to a behavior processing module - behavior_module = types.core.ProcessingModule( ... - 'description', 'Behavioral data processing'); - behavior_module.nwbdatainterface.set('Position', position); - nwb.processing.set('Behavior', behavior_module); - -Trial Structure ---------------- - -Experimental trials are stored in the intervals table: - -.. code-block:: MATLAB - - % Create trials table - trials = types.core.TimeIntervals( ... - 'colnames', {'start_time', 'stop_time', 'stimulus_type', 'response'}, ... - 'description', 'Experimental trials'); - - % Add individual trials - trials.addRow( ... - 'start_time', 0.0, ... - 'stop_time', 2.0, ... - 'stimulus_type', 'left_grating', ... - 'response', 'correct'); - - trials.addRow( ... - 'start_time', 5.0, ... - 'stop_time', 7.0, ... - 'stimulus_type', 'right_grating', ... - 'response', 'incorrect'); - - nwb.intervals_trials = trials; - -Large Dataset Considerations ----------------------------- - -For large datasets, consider using :class:`types.untyped.DataPipe` for compression and chunking: - -.. code-block:: MATLAB - - % Large imaging dataset with compression - large_imaging_data = uint16(randn(512, 512, 10000) * 1000); - - compressed_data = types.untyped.DataPipe( ... - 'data', large_imaging_data, ... - 'compressionLevel', 6, ... - 'chunkSize', [512, 512, 1]); % Chunk by frame - - two_photon_series = types.core.TwoPhotonSeries( ... - 'data', compressed_data, ... - 'imaging_plane', types.untyped.SoftLink(imaging_plane), ... - 'starting_time', 0.0, ... - 'starting_time_rate', 30.0, ... - 'data_unit', 'fluorescence'); - -See :doc:`performance_optimization` for detailed information on handling large datasets efficiently. - - -Validation and Consistency --------------------------- - -Key principles for data organization: - -1. **Use appropriate data types** - don't store imaging data as generic TimeSeries -2. **Maintain consistent units** - ensure all related data uses the same time base -3. **Document your choices** - use descriptive names and fill in description fields - -.. code-block:: MATLAB - - % Good practice: descriptive names and consistent units - nwb.acquisition.set('RawExtracellularV1', electrical_series); - nwb.acquisition.set('CalciumImagingV1L23', two_photon_series); - - % Bad practice: generic names, unclear relationships - nwb.acquisition.set('Data1', electrical_series); - nwb.acquisition.set('Data2', two_photon_series); - -Next Steps ----------- - -With your data properly organized, the next considerations are performance optimization and understanding HDF5 constraints that affect how you structure your file creation workflow. diff --git a/docs/source/pages/concepts/file_create/neurodata_types.rst b/docs/source/pages/concepts/file_create/neurodata_types.rst new file mode 100644 index 000000000..6b4b5e62f --- /dev/null +++ b/docs/source/pages/concepts/file_create/neurodata_types.rst @@ -0,0 +1,115 @@ +Understanding MatNWB Neurodata Types +==================================== + +MatNWB neurodata types are specialized MATLAB classes that represent different kinds of neuroscience data. These types provide structured containers that hold your data along with the metadata and organizational information needed to interpret it correctly. + +Why Specialized Types Instead of Standard Data Types? +----------------------------------------------------- + +MatNWB's neurodata types have several advantages compared to using generic MATLAB arrays or structs for storing data: + +**They encode domain knowledge**: Each type includes the specific requirements for neuroscience data. A :class:`types.core.ElectricalSeries` requires electrode information, sampling rates, and data units - enforcing these requirements automatically rather than relying on you to remember them. + +**They prevent common mistakes**: The types guide you toward correct data organization. For example, you cannot store imaging data without specifying the imaging plane when using :class:`types.core.TwoPhotonSeries`. + +**They ensure compatibility**: Data stored in these types will work with other NWB tools and can be shared with collaborators who use different analysis software. + +The Foundation: TimeSeries +--------------------------- + +Most neuroscience data varies over time, so MatNWB builds around a fundamental concept: :class:`types.core.TimeSeries`. This isn't just a MATLAB array with timestamps - it's a structured way to represent any measurement that changes over time. + +**What TimeSeries provides:** + +- **Data with context**: Your measurements plus information about what they represent +- **Time handling**: Flexible ways to represent regular or irregular sampling +- **Metadata storage**: Data units, descriptions, and experimental details stay attached to the data +- **Relationship tracking**: Connections to other parts of your experiment + +**When to use basic TimeSeries**: For any time-varying measurement that doesn't fit a more specific type - like custom behavioral metrics, environmental sensors, or novel measurement techniques. + +Specialized TimeSeries Types +---------------------------- + +MatNWB provides specialized versions of TimeSeries for common neuroscience data types. These aren't just conveniences - they capture the specific requirements and relationships of different experimental approaches. + +**ElectricalSeries: For Electrical Recordings** + +Understanding electrical recordings requires knowing which electrodes recorded the data, their locations, and recording parameters. :class:`types.core.ElectricalSeries` handles these relationships automatically. + +The key insight: electrical data isn't just voltages over time - it's voltages from specific spatial locations in the brain, recorded with particular methods and settings. + +**TwoPhotonSeries and OnePhotonSeries: For Optical Data** + +Calcium imaging data has fundamentally different characteristics than electrical recordings. These types understand that optical data comes from specific imaging planes, uses particular indicators, and has unique technical parameters like excitation wavelengths. + +The key insight: optical data represents neural activity indirectly through fluorescence changes, requiring different metadata and processing considerations. + +**SpatialSeries: For Position and Movement** + +Behavioral tracking data represents the subject's position or movement through space. :class:`types.core.SpatialSeries` understands spatial coordinates, reference frames, and the relationship between position and time. + +The key insight: spatial data requires coordinate system information to be meaningful - the same X,Y coordinates mean different things in different reference frames. + +Container Types: Organizing Related Data +---------------------------------------- + +Some neurodata types don't hold data directly - they organize other types into meaningful groups. + +**ProcessingModule: Grouping Related Analyses** + +Experiments often involve multiple processing steps that belong together. :class:`types.core.ProcessingModule` lets you group related processed data, maintaining the logical flow of your analysis pipeline. + +The key insight: processed data gains meaning through its relationship to the raw data and processing steps that created it. + +**Position, CompassDirection, BehavioralEvents: Behavioral Organization** + +These container types organize different aspects of behavioral data. Rather than scattering behavioral measurements throughout your file, they provide structured locations that other researchers will recognize. + +The key insight: behavioral experiments often involve multiple simultaneous measurements that need to be understood as a coordinated whole. + + +Table-Based Types: Structured Metadata +-------------------------------------- + +Some experimental information is naturally tabular rather than time-series based. + +**Units Table: Spike Data Organization** + +Sorted spike data doesn't fit well into TimeSeries because each unit has different spike times. The :class:`types.core.Units` table provides a structured way to store spike times, waveforms, and unit metadata together. + +The key insight: spike sorting creates discrete events (spikes) rather than continuous measurements, requiring different organizational principles. + +**Electrode Tables: Recording Site Information** + +Information about recording electrodes (location, impedance, brain region) is relatively static but essential for interpreting electrical data. Electrode tables store this information once and allow multiple data types to reference it. + +The key insight: experimental metadata often has different temporal characteristics than the data itself - electrode properties don't change during recording, but voltage measurements do. + + +How MatNWB Types Work in Practice +--------------------------------- + +- **Object-Oriented Organization**: Each neurodata type is a MATLAB class with specific properties. When you create an object, MATLAB ensures you provide the required information and validates the data types. + +- **Automatic Relationships**: Types understand their relationships to other types. When you reference an electrode table from an ElectricalSeries, MatNWB maintains that connection automatically. + +- **Flexible Extension**: While types have required properties, you can add additional information as needed. This lets you capture experiment-specific details while maintaining compatibility. + +- **Validation and Error Prevention**: Types catch common errors before they become problems. Missing required properties, incorrect data shapes, or type mismatches generate helpful error messages. + +Choosing the Right Type +----------------------- + +The goal isn't to memorize every available type, but to understand the principle: **match your data to the type that best represents its experimental meaning**. + +**Ask yourself:** + +- What kind of measurement is this? (electrical, optical, behavioral, etc.) +- How does it relate to other parts of my experiment? +- What contextual information is needed to interpret it? +- Would another researcher understand this data organization? + +**Start simple**: When in doubt, basic TimeSeries can represent any time-varying data. You can always use more specific types as you become familiar with them. + +**Follow the data flow**: Raw measurements go in acquisition, processed results go in processing modules, final analyses go in analysis. This mirrors your experimental workflow. From eafbddc59457f3b0b1db08b32c71228d27bdd0c6 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 25 Sep 2025 10:59:34 +0200 Subject: [PATCH 19/67] Minor reformulations --- docs/source/pages/concepts/file_create.rst | 2 +- docs/source/pages/getting_started/installation.rst | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/pages/concepts/file_create.rst b/docs/source/pages/concepts/file_create.rst index 13065801c..fc5b287f2 100644 --- a/docs/source/pages/concepts/file_create.rst +++ b/docs/source/pages/concepts/file_create.rst @@ -18,7 +18,7 @@ Building an NWB file follows a logical pattern: - **Create neurodata objects**: You create objects for your data (like :class:`types.core.TimeSeries` for time-based measurements) -- **Add to containers**: You add these data objects to your :class:`NwbFile` object (or other objects) in appropriate locations +- **Add to containers**: You add these data objects to your :class:`NwbFile` object (or other NWB container objects) in appropriate locations - **File export**: You save everything to disk using :func:`nwbExport`, which translates your objects into NWB/HDF5 format diff --git a/docs/source/pages/getting_started/installation.rst b/docs/source/pages/getting_started/installation.rst index 66205de4c..44d4b76e2 100644 --- a/docs/source/pages/getting_started/installation.rst +++ b/docs/source/pages/getting_started/installation.rst @@ -6,7 +6,7 @@ Installation Quick install ------------- -If you want the shortest path and have ``git`` available, run the following snippet in MATLAB. This clones into your current working directory, adds MatNWB to the path, and optionally persists the change: +If you want the quickest installation option, and you have ``git`` available, run the following snippet in MATLAB. This clones into your current working directory, adds MatNWB to the path, and optionally persists the change: .. code-block:: matlab From 343c3038bddf86c25d7479d0d3b3d51ee8465f55 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 25 Sep 2025 17:00:05 +0200 Subject: [PATCH 20/67] Update hdf5_considerations.rst --- .../file_create/hdf5_considerations.rst | 278 ++++-------------- 1 file changed, 52 insertions(+), 226 deletions(-) diff --git a/docs/source/pages/concepts/file_create/hdf5_considerations.rst b/docs/source/pages/concepts/file_create/hdf5_considerations.rst index b30d65b72..bd6a25c0c 100644 --- a/docs/source/pages/concepts/file_create/hdf5_considerations.rst +++ b/docs/source/pages/concepts/file_create/hdf5_considerations.rst @@ -3,232 +3,58 @@ HDF5 Considerations and Limitations =================================== -NWB files are stored in HDF5 format, which provides excellent performance and portability but comes with important limitations that affect how you create and modify files. Understanding these constraints is essential for effective NWB file management. - -.. warning:: - **Critical HDF5 Limitations** - - - Files cannot be easily modified after creation - - Adding new datasets requires specialized approaches - - Concurrent access by multiple processes is not supported - - Schema changes require recreating the entire file - - Large datasets need careful memory management - -File Modification Challenges ----------------------------- - -**The Core Problem:** - -Unlike simple text files, HDF5 files have a complex internal structure that makes modifications difficult: - -.. code-block:: MATLAB - - % This workflow is PROBLEMATIC: - - % Day 1: Create initial file - nwb = create_basic_nwb_file(); - nwbExport(nwb, 'experiment.nwb'); - - % Day 2: Try to add more data (DIFFICULT!) - nwb = nwbRead('experiment.nwb'); - % Adding new acquisition data here is complex and error-prone - new_data = record_more_data(); - % nwb.acquisition.set('day2_data', new_data); % Not straightforward! - % nwbExport(nwb, 'experiment.nwb'); % May corrupt the file - -**Why Modification is Difficult:** - -1. **Fixed internal structure** - HDF5 pre-allocates space for datasets -2. **Metadata dependencies** - Changes can break internal links and references -3. **Compression conflicts** - Compressed data cannot be easily extended -4. **Schema validation** - New data must maintain consistency with existing structure - -Strategies for File Modification ---------------------------------- - -**Strategy 1: Plan for Incremental Data (Recommended)** - -Design your workflow to accommodate all expected data from the start: - -.. code-block:: MATLAB - - % Create file structure for ALL expected data upfront - nwb = NwbFile( ... - 'session_start_time', datetime('now', 'TimeZone', 'local'), ... - 'identifier', 'session_001', ... - 'session_description', 'Multi-day recording session'); - - % Pre-allocate space for time series that will grow - initial_data = zeros(0, 32); % Start with 0 timepoints, 32 channels - max_timepoints = 1000000; % But plan for up to 1M timepoints - - data_pipe = types.untyped.DataPipe( ... - 'data', initial_data, ... - 'maxSize', [max_timepoints, 32], ... % Reserve space - 'axis', 1); % Allow growth along time axis - - electrical_series = types.core.ElectricalSeries( ... - 'data', data_pipe, ... - 'electrodes', electrode_region, ... - 'starting_time', 0.0, ... - 'starting_time_rate', 30000.0); - - nwb.acquisition.set('extracellular', electrical_series); - nwbExport(nwb, 'experiment.nwb'); - - % Later: Append new data incrementally - nwb = nwbRead('experiment.nwb', 'ignorecache'); - new_chunk = record_next_data_chunk(); - nwb.acquisition.get('extracellular').data.append(new_chunk); - -**Strategy 2: Separate Files for Each Session** - -Keep each recording session in its own file: - -.. code-block:: MATLAB - - % Better approach: separate files - for session = 1:num_sessions - nwb = create_session_nwb(session); - filename = sprintf('experiment_session_%03d.nwb', session); - nwbExport(nwb, filename); - end - - % Analysis code reads multiple files as needed - all_sessions = {}; - for session = 1:num_sessions - filename = sprintf('experiment_session_%03d.nwb', session); - all_sessions{session} = nwbRead(filename); - end - -**Strategy 3: Recreate Files When Necessary** - -For significant additions, recreate the entire file: - -.. code-block:: MATLAB - - % Read existing data - old_nwb = nwbRead('experiment_v1.nwb'); - - % Create new file with old + new data - new_nwb = NwbFile( ... - 'session_start_time', old_nwb.session_start_time, ... - 'identifier', old_nwb.identifier, ... - 'session_description', old_nwb.session_description); - - % Copy existing data - copy_data_objects(old_nwb, new_nwb); - - % Add new data - new_nwb.acquisition.set('additional_recording', new_electrical_series); - - % Export new version - nwbExport(new_nwb, 'experiment_v2.nwb'); - -Edit Mode vs. Overwrite Mode ----------------------------- - -MatNWB provides two export modes with different behaviors: - -.. code-block:: MATLAB - - % Overwrite mode (default): Creates new file, replacing any existing file - nwbExport(nwb, 'data.nwb', 'overwrite'); - - % Edit mode: Attempts to modify existing file (LIMITED FUNCTIONALITY) - nwbExport(nwb, 'data.nwb', 'edit'); - -**Edit Mode Limitations:** - -- Can only modify certain metadata fields -- Cannot add new datasets or change data structure -- Cannot resize existing datasets -- Primarily useful for updating file creation timestamps - -.. warning:: - Edit mode is **not** a general solution for file modification. It should only be used for minor metadata updates. - - -Concurrent Access Limitations ------------------------------ - -**Problem: Multiple Processes Cannot Write Simultaneously** - -.. code-block:: MATLAB - - % This will fail if run simultaneously: - - % Process 1: - nwb1 = nwbRead('shared_file.nwb'); - % ... modify nwb1 ... - nwbExport(nwb1, 'shared_file.nwb'); % Will lock file - - % Process 2 (running at same time): - nwb2 = nwbRead('shared_file.nwb'); % May fail or get corrupted data - % ... modify nwb2 ... - nwbExport(nwb2, 'shared_file.nwb'); % Will overwrite Process 1's changes! - -**Solutions for Concurrent Workflows:** - -1. **Use separate files per process:** - -.. code-block:: MATLAB - - % Each process writes to its own file - process_id = get_process_id(); - filename = sprintf('data_process_%d.nwb', process_id); - nwbExport(nwb, filename); - - % Combine files later in post-processing step - -2. **Coordinate access with file locking:** - -.. code-block:: MATLAB - - function safe_nwb_append(filename, new_data) - lock_file = [filename '.lock']; - - % Wait for exclusive access - while exist(lock_file, 'file') - pause(0.1); - end - - % Create lock - fclose(fopen(lock_file, 'w')); - - try - % Perform file operation - nwb = nwbRead(filename); - nwb.acquisition.get('data').data.append(new_data); - % Note: this may still fail due to HDF5 limitations - - finally - % Always release lock - if exist(lock_file, 'file') - delete(lock_file); - end - end - end - -Schema Consistency Requirements -------------------------------- - -**The Problem:** - -HDF5 requires that data structure remains consistent with the schema: - -Scenario: -- Read a previously generated file to make changes with ignorecache -- Current types are of different schema version -- Create new types and add to file +Working with NWB files in MATLAB involves interacting with the **HDF5** storage format. +HDF5 provides excellent performance, hierarchical organization, and portability — but it also imposes some important **limitations** that influence how you create, modify, and manage NWB files. +This page explains these limitations conceptually, so you can design data pipelines and workflows that avoid common pitfalls. -Working Within HDF5 Constraints -------------------------------- +Why limitations matter +---------------------- -**Recommended Workflow:** +HDF5 is designed for efficient, large-scale data storage — not for frequent editing or multi-user collaboration. +Once data is written, changing the file structure or contents is often constrained by the format itself. -1. **Plan your complete data structure upfront** -2. **Use separate files for truly independent data** -3. **Pre-allocate space for datasets that will grow** - -Understanding these HDF5 limitations will help you design robust workflows that work reliably with NWB files. The next section covers performance optimization strategies that work within these constraints. +Understanding these constraints will help you: + +- Plan ahead when designing datasets and attributes +- Avoid costly re-writes and data corruption +- Structure workflows for safe and efficient data access + +Key limitations in practice +--------------------------- + +Existing datasets cannot be freely modified +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Once a dataset is written to disk, it is essentially fixed in size and structure. +If you need to **append** or **stream** additional data (for example, writing trial data as it becomes available), you must create the dataset with this in mind from the start. + +In MatNWB, this is typically done with the :class:`~types.untyped.DataPipe` class, which supports writing data incrementally to an extendable dataset. + +Data and attributes cannot be removed — and deletion does not reduce file size +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +HDF5 does not support in-place removal of datasets or attributes in the way a database might. +While it is possible at a low level to "unlink" objects from the file, space is not reclaimed. +If you need to significantly restructure a file, the standard approach is to **create a new NWB file** and copy the desired data into it. + +**Implication:** +Plan carefully which datasets and metadata to include before writing. Making changes later often means recreating the file from scratch. + +Multiple-writer access is not supported +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +HDF5 files are not designed for concurrent writes. +If multiple processes or threads attempt to write to the same file at the same time, the result can be **file corruption**. +In most workflows, this means ensuring that **only one process writes to an NWB file** at any time. + +**Best practice:** + +- Use a single writer process and close the file before reading it elsewhere. +- If multiple processes need access, coordinate reads and writes through a shared queue or write data separately and merge later. + +Takeaway +-------- + +These limitations reflect HDF5’s design priorities: efficient, large-scale storage and high-performance sequential access — **not** dynamic modification or multi-writer concurrency. + +When working with NWB in MatNWB, it is therefore important to: design file structure in advance, write data in predictable ways, and treat files as *immutable records* rather than *editable databases*. From 9fb18e904711519389b93fbf3fe1fc2e5b22793e Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 25 Sep 2025 17:51:57 +0200 Subject: [PATCH 21/67] Update performance_optimization.rst --- .../file_create/performance_optimization.rst | 425 +++--------------- 1 file changed, 53 insertions(+), 372 deletions(-) diff --git a/docs/source/pages/concepts/file_create/performance_optimization.rst b/docs/source/pages/concepts/file_create/performance_optimization.rst index 983b8a568..5cc54239b 100644 --- a/docs/source/pages/concepts/file_create/performance_optimization.rst +++ b/docs/source/pages/concepts/file_create/performance_optimization.rst @@ -1,399 +1,80 @@ Performance Optimization ======================== -Creating efficient NWB files requires careful consideration of data layout, compression, and memory usage. This section provides strategies for optimizing performance when working with large datasets. +Creating efficient NWB files requires consideration of data layout, compression, and memory usage. +This page explains the key factors that influence performance when writing large datasets with MatNWB and how to design your workflows to make the most of them. -Understanding DataPipe ------------------------ +Why performance considerations matter +------------------------------------- -The :class:`types.untyped.DataPipe` class is the key to efficient data handling in MatNWB. It provides: +NWB files are often used to store large-scale experimental data — from multi-channel electrophysiology to high-resolution imaging. +Writing and reading such datasets can quickly become a bottleneck if the file layout, storage strategy, or memory handling is not carefully planned. -- **Compression** - Reduces file size significantly -- **Chunking** - Optimizes access patterns -- **Pre-allocation** - Reserve space for datasets that will grow over time -- **Iterative writing** - Enables processing datasets larger than RAM -- **Lazy loading** - Data isn't loaded into memory until needed +By understanding how HDF5 stores data and how MatNWB interfaces with it, you can: +- Reduce file size without losing precision +- Speed up read and write operations +- Work efficiently with datasets larger than your available RAM -Basic DataPipe Usage -~~~~~~~~~~~~~~~~~~~~ +Understanding ``DataPipe`` +-------------------------- -.. code-block:: MATLAB +The :class:`~types.untyped.DataPipe` class is central to efficient data handling in MatNWB. +Rather than writing a complete dataset in one step, ``DataPipe`` allows you to define how data should be stored *and* written over time. +This enables several key performance optimizations: - % Simple compression - raw_data = randn(10000, 64); % 10k samples, 64 channels - - compressed_data = types.untyped.DataPipe( ... - 'data', raw_data, ... - 'compressionLevel', 6); % Moderate compression - - electrical_series = types.core.ElectricalSeries( ... - 'data', compressed_data, ... - 'electrodes', electrode_region, ... - 'starting_time', 0.0, ... - 'starting_time_rate', 30000.0); +Compression +~~~~~~~~~~~ -Compression Strategies ----------------------- +HDF5 supports transparent compression of datasets. +When enabled via ``DataPipe``, compression can reduce file size significantly — often by an order of magnitude — without changing how you read or write the data later. -**Choosing Compression Levels:** +**When to use:** +Compression is most beneficial for continuous or image-like data with redundant structure. +For small datasets or those requiring ultra-fast random access, compression may add overhead. -.. code-block:: MATLAB +Chunking +~~~~~~~~ - % Compression level 0: No compression (fastest, largest files) - no_compression = types.untyped.DataPipe('data', data, 'compressionLevel', 0); - - % Compression level 3-6: Good balance (recommended for most cases) - balanced = types.untyped.DataPipe('data', data, 'compressionLevel', 4); - - % Compression level 9: Maximum compression (slowest, smallest files) - max_compression = types.untyped.DataPipe('data', data, 'compressionLevel', 9); +Chunking divides a dataset into fixed-size blocks (chunks) on disk. +This improves performance when reading or writing subsets of data and is essential when using compression. -**Performance Comparison:** +**Why it matters:** +Choosing a chunk size that matches your typical access pattern — for example, time windows, frames, or trials — ensures that reads and writes align with how the data is stored, avoiding unnecessary I/O. -.. code-block:: MATLAB +Pre-allocation +~~~~~~~~~~~~~~ - % Benchmark different compression levels - test_data = uint16(randn(1000, 1000) * 1000 + 2000); % Typical imaging data - - for comp_level = [0, 3, 6, 9] - tic; - data_pipe = types.untyped.DataPipe( ... - 'data', test_data, ... - 'compressionLevel', comp_level); - - nwb = create_test_nwb(); - nwb.acquisition.set('test_data', create_timeseries(data_pipe)); - filename = sprintf('test_compression_%d.nwb', comp_level); - nwbExport(nwb, filename); - - file_info = dir(filename); - time_taken = toc; - - fprintf('Compression %d: %.2f seconds, %.2f MB\n', ... - comp_level, time_taken, file_info.bytes / 1e6); - delete(filename); - end +If you know the approximate size of a dataset in advance, pre-allocating space can improve write performance and prevent fragmentation. +``DataPipe`` allows you to specify an expected maximum shape, so the file can reserve sufficient space before data is written. -Optimal Chunking ----------------- +**Best practice:** +Use pre-allocation when datasets will grow over time but the total size is bounded (e.g. adding trials sequentially). -Chunking determines how data is stored internally and dramatically affects access performance: +Iterative writing +~~~~~~~~~~~~~~~~~ -**Time-Series Chunking:** +For datasets that exceed available RAM, ``DataPipe`` supports writing data incrementally. +This means you can process and write data in chunks — for example, frame by frame or batch by batch — without ever loading the entire dataset into memory. -.. code-block:: MATLAB +**Typical use cases:** - data = randn(100000, 32); % 100k timepoints, 32 channels - - % For temporal analysis (accessing time ranges): - temporal_chunks = types.untyped.DataPipe( ... - 'data', data, ... - 'chunkSize', [1000, 32]); % 1k timepoints, all channels - - % For channel analysis (accessing individual channels): - channel_chunks = types.untyped.DataPipe( ... - 'data', data, ... - 'chunkSize', [100000, 1]); % All timepoints, single channel - - % For block analysis (accessing small time-channel blocks): - block_chunks = types.untyped.DataPipe( ... - 'data', data, ... - 'chunkSize', [1000, 8]); % 1k timepoints, 8 channels +- Writing continuous recordings as they stream from acquisition hardware +- Processing and storing image stacks larger than system memory +- Incrementally populating a large behavioral table -**Imaging Data Chunking:** +Designing for performance +------------------------- -.. code-block:: MATLAB +Optimizing NWB performance is less about tweaking individual parameters and more about designing the **data flow** with these principles in mind: - imaging_data = uint16(randn(512, 512, 1000) * 1000); % 512x512 pixels, 1000 frames - - % For frame-by-frame access: - frame_chunks = types.untyped.DataPipe( ... - 'data', imaging_data, ... - 'chunkSize', [512, 512, 1]); % One complete frame per chunk - - % For pixel time-series analysis: - pixel_chunks = types.untyped.DataPipe( ... - 'data', imaging_data, ... - 'chunkSize', [1, 1, 1000]); % All timepoints for single pixel - - % For ROI-based access: - roi_chunks = types.untyped.DataPipe( ... - 'data', imaging_data, ... - 'chunkSize', [64, 64, 100]); % 64x64 spatial blocks, 100 frames +- Plan dataset shapes and sizes before writing. +- Use compression and chunking deliberately, based on how the data will be accessed. +- Write incrementally rather than assembling massive arrays in memory. +- Treat ``DataPipe`` as part of the design, not just a convenience. -Automatic Chunk Size Selection -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Takeaway +-------- -Let DataPipe choose optimal chunk sizes when you're unsure: - -.. code-block:: MATLAB - - % DataPipe will automatically choose reasonable chunk size - auto_chunked = types.untyped.DataPipe( ... - 'data', data, ... - 'compressionLevel', 6); % Only specify compression - - % You can still provide hints about the primary access dimension - time_optimized = types.untyped.DataPipe( ... - 'data', data, ... - 'axis', 1); % Hint: will primarily access along first dimension (time) - -Memory-Efficient Large Dataset Handling ---------------------------------------- - -**Iterative Writing Workflow:** - -For datasets larger than available RAM: - -.. code-block:: MATLAB - - function create_large_nwb_file(total_duration_sec, sampling_rate, num_channels) - % Calculate dimensions - total_samples = total_duration_sec * sampling_rate; - chunk_duration = 60; % Process 1 minute at a time - chunk_samples = chunk_duration * sampling_rate; - - % Create initial chunk - first_chunk = load_data_chunk(1, chunk_samples, num_channels); - - % Create DataPipe with reserved space - data_pipe = types.untyped.DataPipe( ... - 'data', first_chunk, ... - 'maxSize', [total_samples, num_channels], ... - 'chunkSize', [chunk_samples, num_channels], ... - 'compressionLevel', 6, ... - 'axis', 1); - - % Create NWB file - nwb = create_base_nwb(); - electrical_series = types.core.ElectricalSeries( ... - 'data', data_pipe, ... - 'electrodes', electrode_region, ... - 'starting_time', 0.0, ... - 'starting_time_rate', sampling_rate); - - nwb.acquisition.set('continuous_ephys', electrical_series); - nwbExport(nwb, 'large_dataset.nwb'); - - % Append remaining chunks - nwb = nwbRead('large_dataset.nwb', 'ignorecache'); - num_chunks = ceil(total_samples / chunk_samples); - - for chunk_idx = 2:num_chunks - fprintf('Processing chunk %d of %d\n', chunk_idx, num_chunks); - - % Load next chunk from your data source - chunk_data = load_data_chunk(chunk_idx, chunk_samples, num_channels); - - % Append to file - nwb.acquisition.get('continuous_ephys').data.append(chunk_data); - end - - fprintf('Large dataset creation complete!\n'); - end - -**Streaming from Acquisition Systems:** - -.. code-block:: MATLAB - - function stream_acquisition_to_nwb(acquisition_system, output_file) - % Initialize with small buffer - buffer_size = 30000; % 1 second at 30kHz - initial_data = zeros(buffer_size, 32); - - data_pipe = types.untyped.DataPipe( ... - 'data', initial_data, ... - 'maxSize', [Inf, 32], ... % Unknown final size - 'chunkSize', [buffer_size, 32]); - - % Create and export initial NWB structure - nwb = create_acquisition_nwb(); - nwb.acquisition.set('live_recording', ... - create_electrical_series(data_pipe)); - nwbExport(nwb, output_file); - - % Stream data as it arrives - nwb = nwbRead(output_file, 'ignorecache'); - - while acquisition_system.is_recording() - new_data = acquisition_system.get_next_buffer(); - nwb.acquisition.get('live_recording').data.append(new_data); - end - end - -Optimizing Data Types ---------------------- - -**Choose Appropriate Numeric Types:** - -.. code-block:: MATLAB - - % Raw electrophysiology: often int16 is sufficient - raw_ephys = int16(randn(10000, 32) * 1000); % ±32,767 range - - % Calcium imaging: uint16 typical for camera data - calcium_data = uint16(randn(512, 512, 1000) * 1000 + 2000); - - % Processed data: may need double precision - processed_signals = double(compute_filtered_signals(raw_ephys)); - - % Behavioral measurements: single precision often sufficient - position_data = single(randn(10000, 2)); - -**Memory Usage Comparison:** - -.. code-block:: MATLAB - - % Compare memory usage of different data types - n_samples = 1000000; - - double_data = randn(n_samples, 1); % 8 bytes per sample - single_data = single(randn(n_samples, 1)); % 4 bytes per sample - int16_data = int16(randn(n_samples, 1)*1000); % 2 bytes per sample - - fprintf('Double: %.1f MB\n', whos('double_data').bytes / 1e6); - fprintf('Single: %.1f MB\n', whos('single_data').bytes / 1e6); - fprintf('Int16: %.1f MB\n', whos('int16_data').bytes / 1e6); - -Parallel Processing Considerations ----------------------------------- - -**File-Level Parallelization:** - -Process different experimental sessions in parallel: - -.. code-block:: MATLAB - - session_files = {'session1.mat', 'session2.mat', 'session3.mat'}; - - parfor i = 1:length(session_files) - % Each worker creates its own NWB file - session_data = load(session_files{i}); - nwb = convert_session_to_nwb(session_data); - - output_file = sprintf('session_%03d.nwb', i); - nwbExport(nwb, output_file); - end - -**Data-Level Parallelization:** - -Process large datasets in parallel chunks: - -.. code-block:: MATLAB - - function process_large_dataset_parallel(input_file, output_file) - % Load metadata to determine processing strategy - data_info = get_dataset_info(input_file); - num_chunks = ceil(data_info.total_samples / data_info.chunk_size); - - % Process chunks in parallel - processed_chunks = cell(num_chunks, 1); - - parfor chunk_idx = 1:num_chunks - raw_chunk = load_data_chunk(input_file, chunk_idx); - processed_chunks{chunk_idx} = process_chunk(raw_chunk); - end - - % Combine results sequentially (HDF5 doesn't support parallel writing) - combine_chunks_to_nwb(processed_chunks, output_file); - end - -Performance Monitoring ----------------------- - -**Benchmark Your Workflow:** - -.. code-block:: MATLAB - - function benchmark_nwb_creation(data_sizes, chunk_sizes, compression_levels) - results = table(); - - for data_size = data_sizes - for chunk_size = chunk_sizes - for comp_level = compression_levels - % Generate test data - test_data = randn(data_size, 32); - - % Time the creation process - tic; - data_pipe = types.untyped.DataPipe( ... - 'data', test_data, ... - 'chunkSize', [chunk_size, 32], ... - 'compressionLevel', comp_level); - - nwb = create_test_nwb(); - nwb.acquisition.set('test', create_timeseries(data_pipe)); - - filename = 'benchmark_temp.nwb'; - nwbExport(nwb, filename); - creation_time = toc; - - % Measure file size - file_info = dir(filename); - file_size_mb = file_info.bytes / 1e6; - - % Test read performance - tic; - test_nwb = nwbRead(filename); - sample_data = test_nwb.acquisition.get('test').data.load(1:1000, :); - read_time = toc; - - % Store results - new_row = table(data_size, chunk_size, comp_level, ... - creation_time, file_size_mb, read_time, ... - 'VariableNames', {'DataSize', 'ChunkSize', 'CompressionLevel', ... - 'CreationTime', 'FileSizeMB', 'ReadTime'}); - results = [results; new_row]; - - delete(filename); - end - end - end - - % Display results - disp(results); - - % Plot performance trends - figure; - scatter3(results.DataSize, results.CompressionLevel, results.CreationTime); - xlabel('Data Size'); ylabel('Compression Level'); zlabel('Creation Time (s)'); - title('NWB Creation Performance'); - end - -Best Practices Summary ----------------------- - -1. **Use DataPipe for all large datasets** (> 100 MB) -2. **Choose compression level 4-6** for most applications -3. **Align chunk sizes with your analysis patterns** -4. **Use appropriate numeric data types** to minimize memory usage -5. **Process in parallel at the file level**, not within files -6. **Benchmark your specific workflow** to identify bottlenecks -7. **Pre-allocate space** for datasets that will grow over time - -.. code-block:: MATLAB - - % Template for high-performance NWB creation - function create_optimized_nwb(raw_data_source, output_file) - % Determine optimal parameters for your data - data_info = analyze_data_characteristics(raw_data_source); - - optimal_chunk_size = calculate_optimal_chunks(data_info); - compression_level = 6; % Good default - - % Create DataPipe with optimized settings - data_pipe = types.untyped.DataPipe( ... - 'compressionLevel', compression_level, ... - 'chunkSize', optimal_chunk_size); - - % Build NWB structure efficiently - nwb = build_nwb_structure_fast(); - - % Add data and export - add_data_efficiently(nwb, data_pipe, raw_data_source); - nwbExport(nwb, output_file); - - % Validate performance - validate_file_performance(output_file); - end +Performance optimization in NWB is about aligning data storage with data usage. +By leveraging ``DataPipe`` features — compression, chunking, pre-allocation, and iterative writing — you can build NWB files that are smaller, faster, and more scalable, even when working with datasets far larger than available RAM. From 354455b6cbe5b58519068552c5372f066d4362c9 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 25 Sep 2025 17:52:27 +0200 Subject: [PATCH 22/67] Update overview.rst Remove todos and final note which was too meta --- docs/source/pages/getting_started/overview.rst | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/source/pages/getting_started/overview.rst b/docs/source/pages/getting_started/overview.rst index 9a5218c44..1d2863ff3 100644 --- a/docs/source/pages/getting_started/overview.rst +++ b/docs/source/pages/getting_started/overview.rst @@ -32,8 +32,10 @@ What you can do with MatNWB - Scale to large data - - Stream/append and compress data with the DataPipe interface [Todo: DataPipe reference]. - - Use predefined or custom configuration profiles to optimize files for local storage, cloud storage or archiving. [Todo: How-to guide] + - Stream/append and compress data with the DataPipe interface. + - Use predefined or custom configuration profiles to optimize files for local storage, cloud storage or archiving. + +.. Todo: Add links to DataPipe reference and configuration profiles guide when these are added. - Use and create extensions @@ -53,7 +55,8 @@ The main categories of types you will work with - Containers/wrappers: organize related data (e.g., :doc:`ProcessingModule `). - Time series: sampled data over time (e.g., :doc:`TimeSeries `, :doc:`ElectricalSeries `). - Tables: columnar metadata or data (e.g., :doc:`DynamicTable `). -- Helpers: Helper types [Todo: expand, and link to helper types concept page]. +- Helpers: Helper types +.. [Todo: expand, and link to helper types reference and concept pages when these are added]. Common questions you may encounter (and where to find answers) -------------------------------------------------------------- @@ -108,7 +111,3 @@ Related resources - :nwb_overview:`NWB Overview <>` documentation - Python API (PyNWB_) - Share/discover data: :dandi:`DANDI Archive <>` - -.. note:: - - This page is an overview (explanation). A separate quickstart covers first read/write steps; see the :doc:`tutorials ` and Getting Started pages for hands‑on material. \ No newline at end of file From 96382e84bd70f3f5e51b7e9cf14e5d55fafacd6e Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 25 Sep 2025 18:20:54 +0200 Subject: [PATCH 23/67] Update performance_optimization.rst --- .../file_create/performance_optimization.rst | 81 ++++++++++++------- 1 file changed, 54 insertions(+), 27 deletions(-) diff --git a/docs/source/pages/concepts/file_create/performance_optimization.rst b/docs/source/pages/concepts/file_create/performance_optimization.rst index 5cc54239b..c98d90b63 100644 --- a/docs/source/pages/concepts/file_create/performance_optimization.rst +++ b/docs/source/pages/concepts/file_create/performance_optimization.rst @@ -2,79 +2,106 @@ Performance Optimization ======================== Creating efficient NWB files requires consideration of data layout, compression, and memory usage. -This page explains the key factors that influence performance when writing large datasets with MatNWB and how to design your workflows to make the most of them. +This page gives conceptual guidance on *why* these factors matter in MatNWB. +For step-by-step usage, see the :doc:`DataPipe tutorial ` and the dynamically loaded filter example (:doc:`dynamically loaded filters `). Why performance considerations matter ------------------------------------- NWB files are often used to store large-scale experimental data — from multi-channel electrophysiology to high-resolution imaging. -Writing and reading such datasets can quickly become a bottleneck if the file layout, storage strategy, or memory handling is not carefully planned. +Write and read of large datasets can become a bottleneck if dataset layout, storage strategy, or memory handling is not planned up front. By understanding how HDF5 stores data and how MatNWB interfaces with it, you can: - Reduce file size without losing precision -- Speed up read and write operations -- Work efficiently with datasets larger than your available RAM +- Improve sustained write rates for streaming acquisitions +- Enable analysis on datasets larger than available RAM Understanding ``DataPipe`` -------------------------- -The :class:`~types.untyped.DataPipe` class is central to efficient data handling in MatNWB. -Rather than writing a complete dataset in one step, ``DataPipe`` allows you to define how data should be stored *and* written over time. +The :class:`types.untyped.DataPipe` class is central to efficient data handling in MatNWB. +Instead of creating a full MATLAB array before writing, ``DataPipe`` defines *deferred* dataset creation plus incremental population. +Conceptually, it lets you describe: (1) anticipated shape/growth, (2) storage layout (chunking, compression), and (3) how data will arrive over time. + This enables several key performance optimizations: Compression ~~~~~~~~~~~ -HDF5 supports transparent compression of datasets. -When enabled via ``DataPipe``, compression can reduce file size significantly — often by an order of magnitude — without changing how you read or write the data later. +HDF5 supports transparent compression of chunked datasets. +When enabled via ``DataPipe`` (e.g., specifying a compression filter), file size can be significantly reduced for structured or slowly varying signals. **When to use:** -Compression is most beneficial for continuous or image-like data with redundant structure. -For small datasets or those requiring ultra-fast random access, compression may add overhead. +Continuous signals, image stacks, tables with repeated values. +Avoid (or benchmark) for very small, latency-sensitive random-access datasets. + +**MatNWB note:** +Custom or dynamically loaded filters (e.g., BLOSC, LZ4) can be configured when the underlying HDF5 build supports them—see the :doc:`dynamically loaded filters ` tutorial. Chunking ~~~~~~~~ -Chunking divides a dataset into fixed-size blocks (chunks) on disk. -This improves performance when reading or writing subsets of data and is essential when using compression. +Chunking divides a dataset into fixed-size blocks on disk. +Compression, extensibility, and efficient partial I/O all depend on suitable chunking. **Why it matters:** -Choosing a chunk size that matches your typical access pattern — for example, time windows, frames, or trials — ensures that reads and writes align with how the data is stored, avoiding unnecessary I/O. +Align chunk dimensions with *typical access slices*: time windows, frame ranges, trial segments, or columnar table growth. +Poorly chosen chunks can inflate I/O (reading entire oversized chunks) or degrade compression ratios. + +**MatNWB note:** +``DataPipe`` lets you declare chunk sizes up front; you do not later “fix” chunking without rewriting the dataset. Pre-allocation ~~~~~~~~~~~~~~ -If you know the approximate size of a dataset in advance, pre-allocating space can improve write performance and prevent fragmentation. -``DataPipe`` allows you to specify an expected maximum shape, so the file can reserve sufficient space before data is written. +If you know (or can bound) the eventual dataset size, pre-allocation (declaring a maximum shape) reduces metadata updates and fragmentation. **Best practice:** -Use pre-allocation when datasets will grow over time but the total size is bounded (e.g. adding trials sequentially). +Specify a maximum when growth is monotonic and bounded (e.g., number of samples = sampling_rate * duration, frames in an imaging session, expected trial count). Iterative writing ~~~~~~~~~~~~~~~~~ -For datasets that exceed available RAM, ``DataPipe`` supports writing data incrementally. -This means you can process and write data in chunks — for example, frame by frame or batch by batch — without ever loading the entire dataset into memory. +For datasets exceeding RAM, ``DataPipe`` supports appending or writing slices progressively—processing each batch then discarding it from memory. **Typical use cases:** -- Writing continuous recordings as they stream from acquisition hardware -- Processing and storing image stacks larger than system memory -- Incrementally populating a large behavioral table +- Streaming continuous ephys directly from an acquisition loop +- Writing large image volumes frame-or-plane at a time +- Building a behavioral table row-wise as trials complete + +**MatNWB note:** +Design the *append axis* early. Changing growth direction after data are written is not supported without copying. Designing for performance ------------------------- -Optimizing NWB performance is less about tweaking individual parameters and more about designing the **data flow** with these principles in mind: +Optimization is chiefly about aligning *storage* with *anticipated access*: -- Plan dataset shapes and sizes before writing. -- Use compression and chunking deliberately, based on how the data will be accessed. -- Write incrementally rather than assembling massive arrays in memory. -- Treat ``DataPipe`` as part of the design, not just a convenience. +- Define dataset axes, growth pattern, and approximate bounds before writing. +- Select chunk shapes that mirror dominant retrieval patterns (e.g., (time, channel) vs (frame_y, frame_x)). +- Use compression intentionally—benchmark representative subsets; do not assume the default filter is optimal. +- Stream / append rather than assembling massive in-memory arrays. +- Treat ``DataPipe`` declarations (chunking, compression, max shape) as part of the experimental data model, not an afterthought. + +Additional MatNWB considerations +-------------------------------- + +- MATLAB memory layout (column-major) can influence which axis you stream most cheaply; consider this when choosing chunk dimension ordering. +- Random small writes into highly compressed, large chunks can incur read-modify-write overhead; batch contiguous writes when possible. +- Profiling: Start with a small representative slice (minutes of data, tens of frames) to measure throughput and compression ratio before full-scale export. Takeaway -------- Performance optimization in NWB is about aligning data storage with data usage. -By leveraging ``DataPipe`` features — compression, chunking, pre-allocation, and iterative writing — you can build NWB files that are smaller, faster, and more scalable, even when working with datasets far larger than available RAM. +By leveraging ``DataPipe`` features — compression, chunking, pre-allocation, and iterative writing — you can create NWB files that are smaller, faster, and more scalable, even when datasets exceed available RAM. + +Related tutorials & references +------------------------------ + +- Tutorial: :doc:`DataPipe ` (practical usage patterns) +- Tutorial: :doc:`dynamically loaded filters ` (advanced compression filters) +- API: :class:`types.untyped.DataPipe` +- HDF5 background (external): `Chunking `_ & `Compression `_ From 097f3f436bf089cbeffb8f68cc5ea651072d40c8 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 25 Sep 2025 18:52:58 +0200 Subject: [PATCH 24/67] Update docs/source/pages/getting_started/quickstart.rst Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- docs/source/pages/getting_started/quickstart.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/pages/getting_started/quickstart.rst b/docs/source/pages/getting_started/quickstart.rst index 754120223..2c6062756 100644 --- a/docs/source/pages/getting_started/quickstart.rst +++ b/docs/source/pages/getting_started/quickstart.rst @@ -71,7 +71,7 @@ Step 3 — Write the File .. code-block:: matlab - nwbExport(nwb, 'quickstart_demo.nwb', 'owerwrite'); + nwbExport(nwb, 'quickstart_demo.nwb', 'overwrite'); This writes the NWB file to your current working directory. From 806a5ae21ef1c38d84bf2f7942a16379cc4deb60 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Fri, 26 Sep 2025 14:56:51 +0200 Subject: [PATCH 25/67] Update nwbfile.rst Hide todo as comment --- docs/source/pages/concepts/file_read/nwbfile.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/pages/concepts/file_read/nwbfile.rst b/docs/source/pages/concepts/file_read/nwbfile.rst index 633d59c76..d5d9151d4 100644 --- a/docs/source/pages/concepts/file_read/nwbfile.rst +++ b/docs/source/pages/concepts/file_read/nwbfile.rst @@ -154,7 +154,7 @@ There are 3 primary data types you will encounter when working with NWB files: - NWB schema-defined types (e.g., :class:`types.core.TimeSeries`, :class:`types.core.ElectricalSeries`, :class:`types.hdmf_common.DynamicTable`) - :ref:`Utility types` (e.g., ``types.untyped.Set``, ``types.untyped.DataStub``) -TODO: Briefly discuss schema and utility types. +.. TODO: Briefly discuss schema and utility types. .. _matnwb-read-nwbfile-searchfor: From 7452da55e35121975e15b274554935c69eac80d4 Mon Sep 17 00:00:00 2001 From: Ben Dichter Date: Fri, 26 Sep 2025 14:29:00 -0400 Subject: [PATCH 26/67] Update docs/source/pages/concepts/file_create.rst --- docs/source/pages/concepts/file_create.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/pages/concepts/file_create.rst b/docs/source/pages/concepts/file_create.rst index fc5b287f2..840beb83c 100644 --- a/docs/source/pages/concepts/file_create.rst +++ b/docs/source/pages/concepts/file_create.rst @@ -26,7 +26,7 @@ This approach ensures your data is properly organized and validated before it be **Schema Validation** -The NWB schema acts as a blueprint that defines what makes a valid neuroscience data file. When you export your file, MatNWB checks that: +The NWB schema acts as a blueprint that defines what makes a valid NWB data file. When you export your file, MatNWB checks that: - All required properties are present - Data types match what the schema expects From 65c0d1e55b5459244ef2a1fff32d48c4c339174f Mon Sep 17 00:00:00 2001 From: Ben Dichter Date: Fri, 26 Sep 2025 14:34:33 -0400 Subject: [PATCH 27/67] Update docs/source/pages/concepts/file_create/hdf5_considerations.rst --- docs/source/pages/concepts/file_create/hdf5_considerations.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/pages/concepts/file_create/hdf5_considerations.rst b/docs/source/pages/concepts/file_create/hdf5_considerations.rst index bd6a25c0c..5444964fe 100644 --- a/docs/source/pages/concepts/file_create/hdf5_considerations.rst +++ b/docs/source/pages/concepts/file_create/hdf5_considerations.rst @@ -34,7 +34,7 @@ Data and attributes cannot be removed — and deletion does not reduce file size ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ HDF5 does not support in-place removal of datasets or attributes in the way a database might. -While it is possible at a low level to "unlink" objects from the file, space is not reclaimed. +While it is possible at a low level to "unlink" objects from the file, this often does not reduce the size of the file. If you need to significantly restructure a file, the standard approach is to **create a new NWB file** and copy the desired data into it. **Implication:** From 99d27bdc2134cebe121c243769d1d6ae93ed168e Mon Sep 17 00:00:00 2001 From: Ben Dichter Date: Fri, 26 Sep 2025 14:37:04 -0400 Subject: [PATCH 28/67] Update docs/source/pages/concepts/file_create/hdf5_considerations.rst --- .../concepts/file_create/hdf5_considerations.rst | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/docs/source/pages/concepts/file_create/hdf5_considerations.rst b/docs/source/pages/concepts/file_create/hdf5_considerations.rst index 5444964fe..0e748c283 100644 --- a/docs/source/pages/concepts/file_create/hdf5_considerations.rst +++ b/docs/source/pages/concepts/file_create/hdf5_considerations.rst @@ -40,17 +40,6 @@ If you need to significantly restructure a file, the standard approach is to **c **Implication:** Plan carefully which datasets and metadata to include before writing. Making changes later often means recreating the file from scratch. -Multiple-writer access is not supported -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -HDF5 files are not designed for concurrent writes. -If multiple processes or threads attempt to write to the same file at the same time, the result can be **file corruption**. -In most workflows, this means ensuring that **only one process writes to an NWB file** at any time. - -**Best practice:** - -- Use a single writer process and close the file before reading it elsewhere. -- If multiple processes need access, coordinate reads and writes through a shared queue or write data separately and merge later. Takeaway -------- From 69f67aa04da9ec21f1f68f5f5705fa52562dca9a Mon Sep 17 00:00:00 2001 From: Ben Dichter Date: Fri, 26 Sep 2025 14:45:47 -0400 Subject: [PATCH 29/67] Update docs/source/pages/concepts/file_create/nwbfile.rst --- docs/source/pages/concepts/file_create/nwbfile.rst | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/docs/source/pages/concepts/file_create/nwbfile.rst b/docs/source/pages/concepts/file_create/nwbfile.rst index a6bccaee3..4dd89f8e1 100644 --- a/docs/source/pages/concepts/file_create/nwbfile.rst +++ b/docs/source/pages/concepts/file_create/nwbfile.rst @@ -64,17 +64,6 @@ The :class:`NwbFile` object provides specific properties for organizing differen Various ``general_*`` properties for experimenter, institution, lab, etc. -Working with MATLAB Data Types ------------------------------- - -The :class:`NwbFile` object is designed to work naturally with MATLAB data types: - -- **Datetime handling**: Uses MATLAB's ``datetime`` class with timezone support -- **String/char compatibility**: Accepts both ``char`` arrays and ``string`` objects -- **Numeric arrays**: Works with standard MATLAB matrices and arrays -- **Cell arrays**: Can handle MATLAB cell arrays for text data - -MatNWB automatically converts these MATLAB types to appropriate NWB format during export. Validation and Error Handling ----------------------------- From f40db557e5cdf65696ef1e316f002219b28350e9 Mon Sep 17 00:00:00 2001 From: Ben Dichter Date: Fri, 26 Sep 2025 14:55:57 -0400 Subject: [PATCH 30/67] Update docs/source/pages/concepts/file_create/performance_optimization.rst --- .../pages/concepts/file_create/performance_optimization.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/pages/concepts/file_create/performance_optimization.rst b/docs/source/pages/concepts/file_create/performance_optimization.rst index c98d90b63..beb539ea7 100644 --- a/docs/source/pages/concepts/file_create/performance_optimization.rst +++ b/docs/source/pages/concepts/file_create/performance_optimization.rst @@ -63,7 +63,7 @@ Specify a maximum when growth is monotonic and bounded (e.g., number of samples Iterative writing ~~~~~~~~~~~~~~~~~ -For datasets exceeding RAM, ``DataPipe`` supports appending or writing slices progressively—processing each batch then discarding it from memory. +``DataPipe`` supports appending or writing slices progressively—processing each batch then discarding it from memory. This is particularly useful for datasets that do not fit in RAM. **Typical use cases:** From 994e6d2cceefcec7a064cf943bcebdb8a60308c0 Mon Sep 17 00:00:00 2001 From: Ben Dichter Date: Fri, 26 Sep 2025 15:02:14 -0400 Subject: [PATCH 31/67] Update docs/source/pages/getting_started/overview.rst --- docs/source/pages/getting_started/overview.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/pages/getting_started/overview.rst b/docs/source/pages/getting_started/overview.rst index 1d2863ff3..c81e15be6 100644 --- a/docs/source/pages/getting_started/overview.rst +++ b/docs/source/pages/getting_started/overview.rst @@ -37,7 +37,7 @@ What you can do with MatNWB .. Todo: Add links to DataPipe reference and configuration profiles guide when these are added. -- Use and create extensions +- Use NWB extensions - Install published Neurodata Extensions (NDX) with :doc:`nwbInstallExtension ` - Generate classes from any namespace specification with :doc:`generateExtension `. From 6ad54511d6480431c5b79d5a7409f74eeeef4049 Mon Sep 17 00:00:00 2001 From: Ben Dichter Date: Fri, 26 Sep 2025 15:04:55 -0400 Subject: [PATCH 32/67] Update docs/source/pages/getting_started/overview.rst --- docs/source/pages/getting_started/overview.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/pages/getting_started/overview.rst b/docs/source/pages/getting_started/overview.rst index c81e15be6..22bc7a7ec 100644 --- a/docs/source/pages/getting_started/overview.rst +++ b/docs/source/pages/getting_started/overview.rst @@ -90,7 +90,7 @@ Important considerations when working with MatNWB: - **MATLAB vs. NWB dimension order** : The dimensions of datasets (arrays) in MatNWB are represented in the opposite order relative to the NWB specification. For example, in NWB the time dimension of a TimeSeries is the first dimension of a dataset, whereas in MatNWB, it will be the last dimension of the dataset. See the mappings and examples in the :doc:`Data dimensions ` section for a detailed explanation. -- **NWB schema version conflicts**: When reading NWB files, MatNWB will dynamically build classes for neurodata types from schemas that are embedded in the file. This ensures that the file is always represented correctly according to the schema version (and extensions) that was used when creating the file. However, the generated types will take the place of previously existing types (i.e from different versions), and therefore it is not recommended to work with NWB files of different versions simultaneously. +- **NWB schema versions**: When reading NWB files, MatNWB will dynamically build classes for neurodata types from schemas that are embedded in the file. This ensures that the file is always represented correctly according to the schema version (and extensions) that was used when creating the file. However, the generated types will take the place of previously existing types (i.e from different versions), and therefore it is not recommended to work with NWB files of different versions simultaneously. - **Editing NWB files**: NWB files are stored using the HDF5 standard. This presents some difficulties in editing or appending data to files. See the section on :ref:`HDF5 considerations ` for more details. From 9af98a6804f8b4698a536fd81eb10ec6d0d43192 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Sat, 27 Sep 2025 10:23:25 +0200 Subject: [PATCH 33/67] Rename considerations.rst to dimension_ordering.rst --- docs/source/index.rst | 2 +- .../concepts/{considerations.rst => dimension_ordering.rst} | 4 ++-- docs/source/pages/getting_started/installation.rst | 2 +- docs/source/pages/getting_started/overview.rst | 2 +- docs/source/pages/getting_started/quickstart.rst | 2 +- 5 files changed, 6 insertions(+), 6 deletions(-) rename docs/source/pages/concepts/{considerations.rst => dimension_ordering.rst} (97%) diff --git a/docs/source/index.rst b/docs/source/index.rst index ec259e886..7eddbfafd 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -66,7 +66,7 @@ Looking for a specific topic which has not been mentioned? Check out the full ta :maxdepth: 2 :caption: Concepts - pages/concepts/considerations + pages/concepts/dimension_ordering pages/concepts/file_read pages/concepts/file_create pages/concepts/using_extensions diff --git a/docs/source/pages/concepts/considerations.rst b/docs/source/pages/concepts/dimension_ordering.rst similarity index 97% rename from docs/source/pages/concepts/considerations.rst rename to docs/source/pages/concepts/dimension_ordering.rst index efead93d0..7493c8755 100644 --- a/docs/source/pages/concepts/considerations.rst +++ b/docs/source/pages/concepts/dimension_ordering.rst @@ -1,5 +1,5 @@ -Important considerations (MatNWB) -================================= +Dimension Ordering in MatNWB +============================ When using MatNWB, it is important to understand the differences in how array dimensions are ordered in MATLAB versus HDF5. While the NWB documentation and diff --git a/docs/source/pages/getting_started/installation.rst b/docs/source/pages/getting_started/installation.rst index 44d4b76e2..1316ec9f4 100644 --- a/docs/source/pages/getting_started/installation.rst +++ b/docs/source/pages/getting_started/installation.rst @@ -157,5 +157,5 @@ Next steps ---------- - Read data with :func:`nwbRead` (see :doc:`/pages/concepts/file_read`). -- Review important data dimension notes: :doc:`/pages/concepts/considerations`. +- Review important data dimension notes: :doc:`/pages/concepts/dimension_ordering`. - Explore tutorials: :doc:`../tutorials/index`. \ No newline at end of file diff --git a/docs/source/pages/getting_started/overview.rst b/docs/source/pages/getting_started/overview.rst index 22bc7a7ec..d0a9e1e43 100644 --- a/docs/source/pages/getting_started/overview.rst +++ b/docs/source/pages/getting_started/overview.rst @@ -88,7 +88,7 @@ Common questions you may encounter (and where to find answers) Important considerations when working with MatNWB: -------------------------------------------------- -- **MATLAB vs. NWB dimension order** : The dimensions of datasets (arrays) in MatNWB are represented in the opposite order relative to the NWB specification. For example, in NWB the time dimension of a TimeSeries is the first dimension of a dataset, whereas in MatNWB, it will be the last dimension of the dataset. See the mappings and examples in the :doc:`Data dimensions ` section for a detailed explanation. +- **MATLAB vs. NWB dimension order** : The dimensions of datasets (arrays) in MatNWB are represented in the opposite order relative to the NWB specification. For example, in NWB the time dimension of a TimeSeries is the first dimension of a dataset, whereas in MatNWB, it will be the last dimension of the dataset. See the mappings and examples in the :doc:`Data dimensions ` section for a detailed explanation. - **NWB schema versions**: When reading NWB files, MatNWB will dynamically build classes for neurodata types from schemas that are embedded in the file. This ensures that the file is always represented correctly according to the schema version (and extensions) that was used when creating the file. However, the generated types will take the place of previously existing types (i.e from different versions), and therefore it is not recommended to work with NWB files of different versions simultaneously. diff --git a/docs/source/pages/getting_started/quickstart.rst b/docs/source/pages/getting_started/quickstart.rst index 2c6062756..5cca36950 100644 --- a/docs/source/pages/getting_started/quickstart.rst +++ b/docs/source/pages/getting_started/quickstart.rst @@ -63,7 +63,7 @@ We’ll add a short synthetic signal sampled at 10 Hz for 1 second using the :cl nwb.acquisition.set('DemoSignal', ts); .. note:: - MatNWB uses MATLAB array ordering when writing to HDF5. For multi-dimensional time series, the time dimension should be the last dimension of the MATLAB array. See the :doc:`Data Dimensions ` section in the "MatNWB important considerations" page. + MatNWB uses MATLAB array ordering when writing to HDF5. For multi-dimensional time series, the time dimension should be the last dimension of the MATLAB array. See the :doc:`Data Dimensions ` section in the "MatNWB important considerations" page. Step 3 — Write the File From 0523f6ae59c6ccb39e5569b015ae39b929384840 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 29 Sep 2025 08:49:50 +0200 Subject: [PATCH 34/67] Rename hdf5_considerations.rst to about_hdf5.rst --- docs/source/pages/concepts/file_create.rst | 2 +- .../file_create/{hdf5_considerations.rst => about_hdf5.rst} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename docs/source/pages/concepts/file_create/{hdf5_considerations.rst => about_hdf5.rst} (100%) diff --git a/docs/source/pages/concepts/file_create.rst b/docs/source/pages/concepts/file_create.rst index 840beb83c..db3171309 100644 --- a/docs/source/pages/concepts/file_create.rst +++ b/docs/source/pages/concepts/file_create.rst @@ -40,5 +40,5 @@ If anything is missing or incorrect, you'll get an error message explaining what Understanding the NwbFile Object Understanding Neurodata Types - HDF5 Considerations + What is HDF5? Performance Optimization diff --git a/docs/source/pages/concepts/file_create/hdf5_considerations.rst b/docs/source/pages/concepts/file_create/about_hdf5.rst similarity index 100% rename from docs/source/pages/concepts/file_create/hdf5_considerations.rst rename to docs/source/pages/concepts/file_create/about_hdf5.rst From b152e3ba05451cf2bb72a70bdc117689949348b5 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 29 Sep 2025 11:21:07 +0200 Subject: [PATCH 35/67] Update file_create.rst Change wording --- docs/source/pages/concepts/file_create.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/source/pages/concepts/file_create.rst b/docs/source/pages/concepts/file_create.rst index db3171309..d8c853c03 100644 --- a/docs/source/pages/concepts/file_create.rst +++ b/docs/source/pages/concepts/file_create.rst @@ -12,15 +12,15 @@ As demonstrated in the :doc:`Quickstart ` tut .. note:: An "object" is an instance of a class. Objects are similar to MATLAB structs, but with additional functionality. The fields (called properties) are defined by the class definition (a .m file), and the class can enforce rules about what values are allowed. This helps ensure that your data conforms to the NWB schema. -**The Assembly Process** +**General steps to create an NWB file** -Building an NWB file follows a logical pattern: +Building an NWB file follows a few general steps: -- **Create neurodata objects**: You create objects for your data (like :class:`types.core.TimeSeries` for time-based measurements) +- **Create neurodata objects**: Create neurodata type objects and add your relevant data and metadata (like :class:`types.core.TimeSeries` for time-based measurements) -- **Add to containers**: You add these data objects to your :class:`NwbFile` object (or other NWB container objects) in appropriate locations +- **Add to containers**: Add these neurodata type objects to your :class:`NwbFile` object (or other NWB container objects) in appropriate locations -- **File export**: You save everything to disk using :func:`nwbExport`, which translates your objects into NWB/HDF5 format +- **File export**: Save everything to disk using :func:`nwbExport`, which translates your objects into NWB/HDF5 format This approach ensures your data is properly organized and validated before it becomes a file. From fc47c8d11862174c5dc737520599ea34a44de186 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 29 Sep 2025 11:22:19 +0200 Subject: [PATCH 36/67] Update overview.rst Smaller edits to be more direct Combine Learn Mode and Related Resources sections --- .../source/pages/getting_started/overview.rst | 38 ++++++++++--------- 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/docs/source/pages/getting_started/overview.rst b/docs/source/pages/getting_started/overview.rst index d0a9e1e43..746186990 100644 --- a/docs/source/pages/getting_started/overview.rst +++ b/docs/source/pages/getting_started/overview.rst @@ -5,6 +5,7 @@ Overview ======== + What is MatNWB? --------------- @@ -14,8 +15,9 @@ MatNWB_ is a MATLAB package for reading, writing, and validating NWB files. It p Who is it for? -------------- -- MATLAB users working with neurophysiology and related data (extracellular and intracellular electrophysiology, optical physiology, behavior, images, and derived analyses) -- Labs that want a reproducible, self-describing data format that travels well across tools, languages, and archives (e.g., DANDI) +- MATLAB users working with neurophysiology data (extracellular and intracellular electrophysiology, optical physiology, behavior, images, and derived analyses) +- Labs seeking a reproducible, self-describing data format that works seamlessly across platforms and is supported by an expanding ecosystem of tools and archives (e.g., DANDI). + What you can do with MatNWB --------------------------- @@ -42,8 +44,9 @@ What you can do with MatNWB - Install published Neurodata Extensions (NDX) with :doc:`nwbInstallExtension ` - Generate classes from any namespace specification with :doc:`generateExtension `. -How it works (the mental model) -------------------------------- + +How it works +------------ NWB files are containers for storing data and metadata in a hierarchical manner using groups and datasets. In this sense, an NWB file can be thought of as a tree of folders and files representing all the data associated with neurophysiological recording sessions. The data and metadata is represented through a set of neurodata types defined by the NWB schema. These neurodata types are the building blocks for NWB files and are often used together in specific configurations (see the :doc:`tutorials ` for concrete patterns) @@ -57,6 +60,8 @@ The main categories of types you will work with - Tables: columnar metadata or data (e.g., :doc:`DynamicTable `). - Helpers: Helper types .. [Todo: expand, and link to helper types reference and concept pages when these are added]. +.. [Todo: For tables: TimeIntervals, Units, ElectrodesTable] + Common questions you may encounter (and where to find answers) -------------------------------------------------------------- @@ -85,29 +90,26 @@ Common questions you may encounter (and where to find answers) - See :doc:`Neurodata Extensions ` for guides to install published NDX or to generate classes from your own namespace specification. -Important considerations when working with MatNWB: --------------------------------------------------- +Important caveats when working with MatNWB: +------------------------------------------- - **MATLAB vs. NWB dimension order** : The dimensions of datasets (arrays) in MatNWB are represented in the opposite order relative to the NWB specification. For example, in NWB the time dimension of a TimeSeries is the first dimension of a dataset, whereas in MatNWB, it will be the last dimension of the dataset. See the mappings and examples in the :doc:`Data dimensions ` section for a detailed explanation. -- **NWB schema versions**: When reading NWB files, MatNWB will dynamically build classes for neurodata types from schemas that are embedded in the file. This ensures that the file is always represented correctly according to the schema version (and extensions) that was used when creating the file. However, the generated types will take the place of previously existing types (i.e from different versions), and therefore it is not recommended to work with NWB files of different versions simultaneously. - -- **Editing NWB files**: NWB files are stored using the HDF5 standard. This presents some difficulties in editing or appending data to files. See the section on :ref:`HDF5 considerations ` for more details. +- **NWB schema versions**: When reading an NWB file, MatNWB will dynamically build class definitions for neurodata types from schemas that are embedded in the file. This ensures that the file is always represented correctly according to the schema version (and extensions) that was used when creating the file. However, the generated type classes will take the place of previously existing classes (i.e generated from different NWB versions), and therefore it is not recommended to work with NWB files of different NWB versions simultaneously. +- **Editing NWB files**: If you need to edit NWB files after creation, note that MatNWB currently has certain limitations. See the section on :ref:`HDF5 considerations ` for more details. -Learn more (no steps here—just pointers) ----------------------------------------- - -- Object‑oriented programming refresher (MATLAB): https://www.mathworks.com/help/matlab/object-oriented-programming.html - -Cite MatNWB ------------ - -If MatNWB contributes to your work, please see :doc:`Citing MatNWB `. Related resources ----------------- - :nwb_overview:`NWB Overview <>` documentation - Python API (PyNWB_) +- Object‑oriented programming refresher (MATLAB): https://www.mathworks.com/help/matlab/object-oriented-programming.html - Share/discover data: :dandi:`DANDI Archive <>` + + +Cite MatNWB +----------- + +If MatNWB contributes to your work, please see :doc:`Citing MatNWB `. \ No newline at end of file From f038ff57393edb017850dc1c1cb1051059d252e7 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 29 Sep 2025 12:38:41 +0200 Subject: [PATCH 37/67] Updating the file_create concept pages --- docs/source/pages/concepts/file_create.rst | 3 +- .../pages/concepts/file_create/about_hdf5.rst | 49 ------------------- .../file_create/editing_nwb_files.rst | 33 +++++++++++++ .../pages/concepts/file_create/nwbfile.rst | 3 +- .../concepts/file_create/storage_backends.rst | 22 +++++++++ .../source/pages/getting_started/overview.rst | 2 +- 6 files changed, 60 insertions(+), 52 deletions(-) delete mode 100644 docs/source/pages/concepts/file_create/about_hdf5.rst create mode 100644 docs/source/pages/concepts/file_create/editing_nwb_files.rst create mode 100644 docs/source/pages/concepts/file_create/storage_backends.rst diff --git a/docs/source/pages/concepts/file_create.rst b/docs/source/pages/concepts/file_create.rst index d8c853c03..150106eab 100644 --- a/docs/source/pages/concepts/file_create.rst +++ b/docs/source/pages/concepts/file_create.rst @@ -40,5 +40,6 @@ If anything is missing or incorrect, you'll get an error message explaining what Understanding the NwbFile Object Understanding Neurodata Types - What is HDF5? + Storage Backends + Editing NWB Files Performance Optimization diff --git a/docs/source/pages/concepts/file_create/about_hdf5.rst b/docs/source/pages/concepts/file_create/about_hdf5.rst deleted file mode 100644 index 0e748c283..000000000 --- a/docs/source/pages/concepts/file_create/about_hdf5.rst +++ /dev/null @@ -1,49 +0,0 @@ -.. _hdf5-considerations: - -HDF5 Considerations and Limitations -=================================== - -Working with NWB files in MATLAB involves interacting with the **HDF5** storage format. -HDF5 provides excellent performance, hierarchical organization, and portability — but it also imposes some important **limitations** that influence how you create, modify, and manage NWB files. -This page explains these limitations conceptually, so you can design data pipelines and workflows that avoid common pitfalls. - -Why limitations matter ----------------------- - -HDF5 is designed for efficient, large-scale data storage — not for frequent editing or multi-user collaboration. -Once data is written, changing the file structure or contents is often constrained by the format itself. - -Understanding these constraints will help you: - -- Plan ahead when designing datasets and attributes -- Avoid costly re-writes and data corruption -- Structure workflows for safe and efficient data access - -Key limitations in practice ---------------------------- - -Existing datasets cannot be freely modified -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Once a dataset is written to disk, it is essentially fixed in size and structure. -If you need to **append** or **stream** additional data (for example, writing trial data as it becomes available), you must create the dataset with this in mind from the start. - -In MatNWB, this is typically done with the :class:`~types.untyped.DataPipe` class, which supports writing data incrementally to an extendable dataset. - -Data and attributes cannot be removed — and deletion does not reduce file size -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -HDF5 does not support in-place removal of datasets or attributes in the way a database might. -While it is possible at a low level to "unlink" objects from the file, this often does not reduce the size of the file. -If you need to significantly restructure a file, the standard approach is to **create a new NWB file** and copy the desired data into it. - -**Implication:** -Plan carefully which datasets and metadata to include before writing. Making changes later often means recreating the file from scratch. - - -Takeaway --------- - -These limitations reflect HDF5’s design priorities: efficient, large-scale storage and high-performance sequential access — **not** dynamic modification or multi-writer concurrency. - -When working with NWB in MatNWB, it is therefore important to: design file structure in advance, write data in predictable ways, and treat files as *immutable records* rather than *editable databases*. diff --git a/docs/source/pages/concepts/file_create/editing_nwb_files.rst b/docs/source/pages/concepts/file_create/editing_nwb_files.rst new file mode 100644 index 000000000..85d666c9b --- /dev/null +++ b/docs/source/pages/concepts/file_create/editing_nwb_files.rst @@ -0,0 +1,33 @@ +.. _edit-nwb-files: + +# Editing NWB files +=================== + +After an NWB-file has been exported to disk, it can be re-imported and edited. Generally, adding new data and metadata is straightforward. However, due to the way MatNWB and HDF5 work, there are some limitations if modifying or removing datasets from an existing NWB file. This section outlines these limitations and provides guidance on how to work with existing NWB files in MatNWB. + +1. Appending data to a dataset requires the dataset to have been created as extendable. This is typically done using the :class:`~types.untyped.DataPipe` class when initially creating the dataset. If the dataset was not created as extendable, it cannot be resized or appended to. + +2. Removing property values or neurodata objects from the file object does not free up space in the file itself. If you need to significantly restructure a file, the standard approach is to create a new NWB file and copy the desired data into it. + +Appending data to existing datasets +----------------------------------- +:ref:`HDF5 ` datasets can be created with fixed dimensions or as extendable datasets. By default, MatNWB creates datasets with fixed dimensions. Datasets that were created with fixed dimensions cannot be resized or appended to after they have been written to disk. This means that if you want to append data to a dataset in an existing NWB file, the dataset must have been created as extendable from the start. This is done using the :class:`~types.untyped.DataPipe` class when initially creating the dataset. + +The :class:`~types.untyped.DataPipe` class provides a way to create extendable datasets by specifying the `chunkSize` and `maxSize` properties. The `chunkSize` property determines the size of the chunks that will be written to the dataset, while the `maxSize` property determines the maximum size of the dataset. By setting these properties appropriately, you can create a dataset that can be resized and appended to as needed. + +If you know the final size of a dataset, `maxSize` can be set to this value to optimize storage allocation. If the final size is unknown, the `maxSize` can be set to `Inf` along one or more dimensions to allow unlimited growth. + +For an example of how to use the :class:`~types.untyped.DataPipe` class to create an extendable dataset, see the :doc:`DataPipe example ` tutorial. + +Removing data from existing files +--------------------------------- +:ref:`HDF5 ` support for removing datasets or attributes is limited. While it is possible at a low level to "unlink" objects from the file, this does not reclaim the storage space used by that object. If you need to significantly restructure a file, the standard approach is to create a new NWB file and copy the desired data into it. + +.. warning:: + The :class:`types.untyped.Set` provides a method called `remove` that can be used to remove objects from a set. However, this only removes the object from the in-memory representation of the file and does not remove it from the file on disk. + + +At the moment, MatNWB does not provide built-in functionality to copy data from one NWB file to another. However, you can achieve this by manually reading the desired data from the existing file and writing it to a new file using the appropriate MatNWB classes and methods. + +The following issue on GitHub tracks some of the limitations and potential improvements related to editing NWB files in MatNWB: +`MatNWB - Issue 751 `_ diff --git a/docs/source/pages/concepts/file_create/nwbfile.rst b/docs/source/pages/concepts/file_create/nwbfile.rst index 4dd89f8e1..47a573241 100644 --- a/docs/source/pages/concepts/file_create/nwbfile.rst +++ b/docs/source/pages/concepts/file_create/nwbfile.rst @@ -45,7 +45,8 @@ MatNWB automatically handles some required NWB properties so you don't have to: Object Structure and Organization --------------------------------- - +.. todo:: Link to NWB overview section on file structure here + The :class:`NwbFile` object provides specific properties for organizing different types of data: - **acquisition** - diff --git a/docs/source/pages/concepts/file_create/storage_backends.rst b/docs/source/pages/concepts/file_create/storage_backends.rst new file mode 100644 index 000000000..7da57c5b0 --- /dev/null +++ b/docs/source/pages/concepts/file_create/storage_backends.rst @@ -0,0 +1,22 @@ +.. _storage-backends: + +Storage Backends +================ + +MatNWB currently uses the HDF5 file format for storing NWB files on disk. Please note that NWB is designed to be storage backend agnostic, and future versions of MatNWB may support additional storage backends. + +.. _about-hdf5: + +What is HDF5? +------------- + +HDF5 (Hierarchical Data Format version 5) is a widely used file format for storing large and complex datasets. It is designed to efficiently manage large amounts of heterogeneous data and metadata in a hierarchical structure, making it well-suited for scientific data. It primarily consists of two main components: groups and datasets. Groups are similar to directories in a file system and can contain other groups or datasets. Datasets are multidimensional arrays that hold the actual data. Additionally, both groups and datasets can have attributes, which are small pieces of metadata that provide additional information about the object. + +It is especially well suited for NWB files because: + +- **Hierarchical organization**: HDF5 files can contain nested groups and datasets, allowing NWB to represent complex relationships between different types of data in a structured way. +- **Efficient storage**: HDF5 supports compression and chunking, which helps reduce file size and improve I/O performance for large datasets. +- **Portability**: HDF5 files can be read and written across different platforms and programming languages, facilitating data sharing and collaboration. +- **Extensibility**: HDF5 allows for the addition of custom metadata and data types, which is important for the evolving needs of neuroscience data. + +More details about HDF5 can be found in the `HDF5 Documentation `_. diff --git a/docs/source/pages/getting_started/overview.rst b/docs/source/pages/getting_started/overview.rst index 746186990..c71660c3b 100644 --- a/docs/source/pages/getting_started/overview.rst +++ b/docs/source/pages/getting_started/overview.rst @@ -97,7 +97,7 @@ Important caveats when working with MatNWB: - **NWB schema versions**: When reading an NWB file, MatNWB will dynamically build class definitions for neurodata types from schemas that are embedded in the file. This ensures that the file is always represented correctly according to the schema version (and extensions) that was used when creating the file. However, the generated type classes will take the place of previously existing classes (i.e generated from different NWB versions), and therefore it is not recommended to work with NWB files of different NWB versions simultaneously. -- **Editing NWB files**: If you need to edit NWB files after creation, note that MatNWB currently has certain limitations. See the section on :ref:`HDF5 considerations ` for more details. +- **Editing NWB files**: If you need to edit NWB files after creation, note that MatNWB currently has certain limitations. See the section on :ref:`Editing NWB files ` for more details. Related resources From f4290e29305c774a6461f39cdadb674961bcf624 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 29 Sep 2025 14:28:21 +0200 Subject: [PATCH 38/67] Update editing_nwb_files.rst minor formatting changes --- .../concepts/file_create/editing_nwb_files.rst | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/source/pages/concepts/file_create/editing_nwb_files.rst b/docs/source/pages/concepts/file_create/editing_nwb_files.rst index 85d666c9b..433df28d5 100644 --- a/docs/source/pages/concepts/file_create/editing_nwb_files.rst +++ b/docs/source/pages/concepts/file_create/editing_nwb_files.rst @@ -1,21 +1,21 @@ .. _edit-nwb-files: -# Editing NWB files -=================== +Editing NWB files +================= -After an NWB-file has been exported to disk, it can be re-imported and edited. Generally, adding new data and metadata is straightforward. However, due to the way MatNWB and HDF5 work, there are some limitations if modifying or removing datasets from an existing NWB file. This section outlines these limitations and provides guidance on how to work with existing NWB files in MatNWB. +After an NWB-file has been exported to disk, it can be re-imported and edited. Generally, adding new data and metadata is straightforward. However, due to the way MatNWB and HDF5 work, there are some limitations when modifying or removing datasets from an existing NWB file. This section outlines these limitations and provides guidance on how to work with existing NWB files in MatNWB. -1. Appending data to a dataset requires the dataset to have been created as extendable. This is typically done using the :class:`~types.untyped.DataPipe` class when initially creating the dataset. If the dataset was not created as extendable, it cannot be resized or appended to. +1. **Appending** data to a dataset requires the dataset to have been created as extendable. This is typically done when initially creating a dataset, using the :class:`~types.untyped.DataPipe` class. If the dataset was not created as extendable, it cannot be resized or appended to. -2. Removing property values or neurodata objects from the file object does not free up space in the file itself. If you need to significantly restructure a file, the standard approach is to create a new NWB file and copy the desired data into it. +2. **Removing** property values or neurodata objects from the file object does not free up space in the file itself. If you need to significantly restructure a file, the standard approach is to create a new NWB file and copy the desired data into it. Appending data to existing datasets ----------------------------------- :ref:`HDF5 ` datasets can be created with fixed dimensions or as extendable datasets. By default, MatNWB creates datasets with fixed dimensions. Datasets that were created with fixed dimensions cannot be resized or appended to after they have been written to disk. This means that if you want to append data to a dataset in an existing NWB file, the dataset must have been created as extendable from the start. This is done using the :class:`~types.untyped.DataPipe` class when initially creating the dataset. -The :class:`~types.untyped.DataPipe` class provides a way to create extendable datasets by specifying the `chunkSize` and `maxSize` properties. The `chunkSize` property determines the size of the chunks that will be written to the dataset, while the `maxSize` property determines the maximum size of the dataset. By setting these properties appropriately, you can create a dataset that can be resized and appended to as needed. +The :class:`~types.untyped.DataPipe` class provides a way to create extendable datasets by specifying the ``chunkSize`` and ``maxSize`` properties. The ``chunkSize`` property determines the size of the chunks that will be written to the dataset, while the ``maxSize`` property determines the maximum size of the dataset. By setting these properties appropriately, you can create a dataset that can be resized and appended to as needed. -If you know the final size of a dataset, `maxSize` can be set to this value to optimize storage allocation. If the final size is unknown, the `maxSize` can be set to `Inf` along one or more dimensions to allow unlimited growth. +If you know the final size of a dataset, ``maxSize`` can be set to this value to optimize storage allocation. If the final size is unknown, the ``maxSize`` can be set to ``Inf`` along one or more dimensions to allow unlimited growth. For an example of how to use the :class:`~types.untyped.DataPipe` class to create an extendable dataset, see the :doc:`DataPipe example ` tutorial. @@ -24,7 +24,7 @@ Removing data from existing files :ref:`HDF5 ` support for removing datasets or attributes is limited. While it is possible at a low level to "unlink" objects from the file, this does not reclaim the storage space used by that object. If you need to significantly restructure a file, the standard approach is to create a new NWB file and copy the desired data into it. .. warning:: - The :class:`types.untyped.Set` provides a method called `remove` that can be used to remove objects from a set. However, this only removes the object from the in-memory representation of the file and does not remove it from the file on disk. + The :class:`types.untyped.Set` provides a method called ``remove`` that can be used to remove objects from a set. However, this only removes the object from the in-memory representation of the file and does not remove it from the file on disk. At the moment, MatNWB does not provide built-in functionality to copy data from one NWB file to another. However, you can achieve this by manually reading the desired data from the existing file and writing it to a new file using the appropriate MatNWB classes and methods. From e54b1ad1a205189bd7d282a2afb94f775ccb2208 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 29 Sep 2025 16:01:40 +0200 Subject: [PATCH 39/67] Change performance page and add how-to for using config profiles --- .../file_create/performance_optimization.rst | 113 +++--------- .../source/pages/getting_started/overview.rst | 2 +- .../compression/compression_profiles.rst | 171 ++++++++++++++++++ 3 files changed, 202 insertions(+), 84 deletions(-) create mode 100644 docs/source/pages/how_to/compression/compression_profiles.rst diff --git a/docs/source/pages/concepts/file_create/performance_optimization.rst b/docs/source/pages/concepts/file_create/performance_optimization.rst index beb539ea7..5635e2d94 100644 --- a/docs/source/pages/concepts/file_create/performance_optimization.rst +++ b/docs/source/pages/concepts/file_create/performance_optimization.rst @@ -1,107 +1,54 @@ -Performance Optimization -======================== -Creating efficient NWB files requires consideration of data layout, compression, and memory usage. -This page gives conceptual guidance on *why* these factors matter in MatNWB. -For step-by-step usage, see the :doc:`DataPipe tutorial ` and the dynamically loaded filter example (:doc:`dynamically loaded filters `). - -Why performance considerations matter -------------------------------------- - -NWB files are often used to store large-scale experimental data — from multi-channel electrophysiology to high-resolution imaging. -Write and read of large datasets can become a bottleneck if dataset layout, storage strategy, or memory handling is not planned up front. - -By understanding how HDF5 stores data and how MatNWB interfaces with it, you can: - -- Reduce file size without losing precision -- Improve sustained write rates for streaming acquisitions -- Enable analysis on datasets larger than available RAM - -Understanding ``DataPipe`` --------------------------- +Storage optimization +==================== -The :class:`types.untyped.DataPipe` class is central to efficient data handling in MatNWB. -Instead of creating a full MATLAB array before writing, ``DataPipe`` defines *deferred* dataset creation plus incremental population. -Conceptually, it lets you describe: (1) anticipated shape/growth, (2) storage layout (chunking, compression), and (3) how data will arrive over time. +Neuroscience data can be very large, and compression helps reduce file size, improving both storage efficiency and data transfer speed. -This enables several key performance optimizations: Compression -~~~~~~~~~~~ - -HDF5 supports transparent compression of chunked datasets. -When enabled via ``DataPipe`` (e.g., specifying a compression filter), file size can be significantly reduced for structured or slowly varying signals. - -**When to use:** -Continuous signals, image stacks, tables with repeated values. -Avoid (or benchmark) for very small, latency-sensitive random-access datasets. - -**MatNWB note:** -Custom or dynamically loaded filters (e.g., BLOSC, LZ4) can be configured when the underlying HDF5 build supports them—see the :doc:`dynamically loaded filters ` tutorial. - -Chunking -~~~~~~~~ +----------- -Chunking divides a dataset into fixed-size blocks on disk. -Compression, extensibility, and efficient partial I/O all depend on suitable chunking. +MatNWB supports HDF5 compression filters via the :class:`types.untyped.DataPipe` class. The default filter is GZIP (also known as DEFLATE), which is widely supported and provides a good balance of compression ratio and speed. Custom or dynamically loaded filters (e.g., BLOSC, LZ4) can be configured when the underlying HDF5 build supports them—see the :doc:`dynamically loaded filters ` tutorial. These can offer better performance for specific data types or access patterns, however they may not be as widely supported as GZIP. -**Why it matters:** -Align chunk dimensions with *typical access slices*: time windows, frame ranges, trial segments, or columnar table growth. -Poorly chosen chunks can inflate I/O (reading entire oversized chunks) or degrade compression ratios. +In other words, if you use a custom filter, ensure that any software or collaborators accessing the file can support that filter. -**MatNWB note:** -``DataPipe`` lets you declare chunk sizes up front; you do not later “fix” chunking without rewriting the dataset. - -Pre-allocation -~~~~~~~~~~~~~~ - -If you know (or can bound) the eventual dataset size, pre-allocation (declaring a maximum shape) reduces metadata updates and fragmentation. - -**Best practice:** -Specify a maximum when growth is monotonic and bounded (e.g., number of samples = sampling_rate * duration, frames in an imaging session, expected trial count). - -Iterative writing -~~~~~~~~~~~~~~~~~ +For step-by-step usage, see the :doc:`DataPipe tutorial ` and the dynamically loaded filter example (:doc:`dynamically loaded filters `). -``DataPipe`` supports appending or writing slices progressively—processing each batch then discarding it from memory. This is particularly useful for datasets that do not fit in RAM. -**Typical use cases:** +Chunking +-------- -- Streaming continuous ephys directly from an acquisition loop -- Writing large image volumes frame-or-plane at a time -- Building a behavioral table row-wise as trials complete +A prerequisite for compression is chunking. Chunking is the partitioning of datasets into fixed-size blocks, which are stored and accessed independently. If the full dataset were stored as a single contiguous block, compression would be ineffective for partial reads/writes. That's why chunking is essential for enabling compression, as well as for efficient I/O of large datasets. Choosing optimal chunk sizes and shapes is important for performance, and should be based on expected access patterns. -**MatNWB note:** -Design the *append axis* early. Changing growth direction after data are written is not supported without copying. +For example, if you frequently read time series data in segments (e.g., 1-second windows), chunking along the time axis with a size that matches your typical read length can improve performance. Similarly, for image data, chunking in spatial blocks that align with common access patterns (e.g., tiles or frames) can be beneficial. -Designing for performance -------------------------- +Further, the chunk size can impact compression efficiency. Larger chunks may yield better compression ratios, but can also increase memory usage during read/write operations. Conversely, smaller chunks may reduce memory overhead but could lead to less effective compression. For archival purposes, larger chunks are often preferred to maximize compression, while for interactive analysis, smaller chunks may be more suitable to optimize access speed. For online/cloud access, chunk sizes in the range of 2MB to 10MB are often recommended, but this can vary based on specific use cases and data characteristics. -Optimization is chiefly about aligning *storage* with *anticipated access*: -- Define dataset axes, growth pattern, and approximate bounds before writing. -- Select chunk shapes that mirror dominant retrieval patterns (e.g., (time, channel) vs (frame_y, frame_x)). -- Use compression intentionally—benchmark representative subsets; do not assume the default filter is optimal. -- Stream / append rather than assembling massive in-memory arrays. -- Treat ``DataPipe`` declarations (chunking, compression, max shape) as part of the experimental data model, not an afterthought. +MatNWB Configuration Profiles +----------------------------- +MatNWB provides predefined configuration profiles that set sensible defaults for chunking and compression based on common use cases. These profiles can be specified when creating an NWB file, allowing users to optimize their files for local storage, cloud storage, or archiving without needing to manually configure each parameter. -Additional MatNWB considerations --------------------------------- +Profile comparison: +~~~~~~~~~~~~~~~~~~~ -- MATLAB memory layout (column-major) can influence which axis you stream most cheaply; consider this when choosing chunk dimension ordering. -- Random small writes into highly compressed, large chunks can incur read-modify-write overhead; batch contiguous writes when possible. -- Profiling: Start with a small representative slice (minutes of data, tens of frames) to measure throughput and compression ratio before full-scale export. +* ``default``: Balanced; small (1 MB) target chunks, gzip level 3. +* ``cloud``: Slightly larger chunks (10 MB) + shuffle for better remote object store streaming; dataset‑specific override for ``ElectricalSeries/data`` to bound one dimension (e.g. 64 samples per chunk row) aiding partial reads. +* ``archive``: Large target (100 MB) to improve compression ratio, Zstandard level 5 (faster decompression than high‑level gzip for similar ratios). Good for cold storage. -Takeaway --------- +See the :doc:`compression profiles ` guide for details on using these profiles, as well as how to create custom configurations tailored to your specific needs. -Performance optimization in NWB is about aligning data storage with data usage. -By leveraging ``DataPipe`` features — compression, chunking, pre-allocation, and iterative writing — you can create NWB files that are smaller, faster, and more scalable, even when datasets exceed available RAM. -Related tutorials & references ------------------------------- +MatNWB tutorials & references +----------------------------- - Tutorial: :doc:`DataPipe ` (practical usage patterns) - Tutorial: :doc:`dynamically loaded filters ` (advanced compression filters) - API: :class:`types.untyped.DataPipe` -- HDF5 background (external): `Chunking `_ & `Compression `_ + +External references +------------------- + +- HDF5 background: `Chunking `_ & `Compression `_ +- Cloud-optimized NetCDF4/HDF5: `Guide `_ +- Cloud-optimized HDF5: `Presentation ` \ No newline at end of file diff --git a/docs/source/pages/getting_started/overview.rst b/docs/source/pages/getting_started/overview.rst index c71660c3b..367028c7a 100644 --- a/docs/source/pages/getting_started/overview.rst +++ b/docs/source/pages/getting_started/overview.rst @@ -58,7 +58,7 @@ The main categories of types you will work with - Containers/wrappers: organize related data (e.g., :doc:`ProcessingModule `). - Time series: sampled data over time (e.g., :doc:`TimeSeries `, :doc:`ElectricalSeries `). - Tables: columnar metadata or data (e.g., :doc:`DynamicTable `). -- Helpers: Helper types +- Helpers: :ref:`Helper types ` for common patterns like references, links, and data I/O. .. [Todo: expand, and link to helper types reference and concept pages when these are added]. .. [Todo: For tables: TimeIntervals, Units, ElectrodesTable] diff --git a/docs/source/pages/how_to/compression/compression_profiles.rst b/docs/source/pages/how_to/compression/compression_profiles.rst new file mode 100644 index 000000000..baa87792a --- /dev/null +++ b/docs/source/pages/how_to/compression/compression_profiles.rst @@ -0,0 +1,171 @@ +.. _howto-compression-profiles: + +How to apply compression & chunking profiles when writing NWB files +=================================================================== + +This how-to shows you, step by step, how to apply a predefined (or custom) dataset +configuration profile (chunking + compression) to the datasets in an ``NwbFile`` +before exporting with :func:`nwbExport`. It focuses on the practical steps – *what to do* – +and assumes you already know in general why chunking and compression matter (see the +:doc:`performance optimization `). + +.. contents:: On this page + :local: + :depth: 2 + +At a glance +----------- +1. Create or load your ``NwbFile`` and populate data. +2. Read a dataset configuration profile (``default``, ``cloud``, or ``archive`` – or your own). +3. Apply it with :func:`io.config.applyDatasetConfiguration`. +4. Export. + +When to use this +---------------- +Use this whenever you have medium/large numeric datasets and you want: + +* Reasonable default gzip (deflate) compression and adaptive chunk sizes (``default``). +* Cloud‑optimized access patterns (smaller per-chunk footprint + shuffle) (``cloud``). +* Higher compression ratio for long‑term storage (larger chunk targets + Zstandard) (``archive``). + +Prerequisites +------------- +* MatNWB installed and on the MATLAB path. +* Basic familiarity with creating NWB objects (see the MatNWB tutorials if needed). + +Key functions & files +--------------------- +* ``io.config.readDatasetConfiguration(profile)`` – loads JSON from ``configuration/*_dataset_configuration.json``. +* ``io.config.applyDatasetConfiguration(nwb, config, "OverrideExisting", false)`` – wraps qualifying numeric arrays in ``types.untyped.DataPipe`` with computed ``chunkSize`` and compression filters. +* Configuration JSON examples (shipped): + + - ``configuration/default_dataset_configuration.json`` + - ``configuration/cloud_dataset_configuration.json`` + - ``configuration/archive_dataset_configuration.json`` + +Quick start example +------------------- +.. code-block:: matlab + + % 1. Create and populate an NWB file + nwb = NwbFile(); % (set identifiers, session start time, etc.) + data = rand(1e6,1,'single'); % Example large vector + es = types.core.ElectricalSeries(... + 'data', data, ... + 'data_unit', 'volts', ... + 'starting_time', 0, ... + 'starting_time_rate', 30000); + nwb.acquisition.set('example_eSeries', es); + + % 2. Load a profile (choose "default", "cloud", or "archive") + cfg = io.config.readDatasetConfiguration("cloud"); + + % 3. Apply it (wraps large numeric datasets in DataPipe objects) + io.config.applyDatasetConfiguration(nwb, cfg); + + % 4. Export + nwbExport(nwb, 'example_cloud_profile.nwb'); + +What happens under the hood? +---------------------------- +``applyDatasetConfiguration`` walks every neurodata object in the file tree and, for each numeric dataset: + +* Resolves the most specific matching entry in the configuration (dataset‑level override beats ``Default``). +* Computes a target ``chunkSize`` given: + - ``chunking.target_chunk_size`` + ``target_chunk_size_unit`` (e.g. 1,000,000 bytes) + - ``chunking.strategy_by_rank`` list for the dataset’s rank (e.g. ["flex", "max"]). + * ``flex`` → dimension is sized so total bytes per chunk ≈ target. + * ``max`` → take full length of that dimension. + * Numeric value → upper bound (capped by actual size). +* Chooses compression: + - ``method = deflate`` (gzip) → uses ``compressionLevel`` (default 3 if absent). + - Other methods (e.g. ``ZStandard``) → inserted as a custom filter. + - Optional ``prefilters`` like ``shuffle`` improve compression on integer / low‑entropy columns. +* Replaces the raw numeric array with a ``types.untyped.DataPipe`` configured with ``chunkSize``, compression filters, and (for vectors) a columnar representation (``maxSize = Inf`` ensures 1‑D write layout). + +Selecting a profile +------------------- +Profile comparison (conceptual): + +* ``default``: Balanced; small (1 MB) target chunks, gzip level 3. +* ``cloud``: Slightly larger chunks (10 MB) + shuffle for better remote object store streaming; dataset‑specific override for ``ElectricalSeries/data`` to bound one dimension (e.g. 64 samples per chunk row) aiding partial reads. +* ``archive``: Large target (100 MB) to improve compression ratio, Zstandard level 5 (faster decompression than high‑level gzip for similar ratios). Good for cold storage. + +Overriding an existing DataPipe +------------------------------- +If you already created a ``DataPipe`` manually (or ran a profile once) and want to re‑apply with a different profile: + +.. code-block:: matlab + + newCfg = io.config.readDatasetConfiguration("archive"); + io.config.applyDatasetConfiguration(nwb, newCfg, "OverrideExisting", true); + +Customizing a profile +--------------------- +1. Copy one of the shipped JSON files (e.g. ``default_dataset_configuration.json``) to a new file in ``configuration/`` (e.g. ``myprofile_dataset_configuration.json``). +2. Adjust fields: + + * ``chunking.target_chunk_size`` / ``_unit``: Overall chunk byte target. + * ``chunking.strategy_by_rank``: For each rank (key is the number of dimensions). Each list position corresponds to a dimension (slowest → fastest in MATLAB order). Use: + - ``"flex"`` + - ``"max"`` + - an integer (upper bound) + * ``compression.method``: ``deflate`` (gzip), ``ZStandard`` (if filter available), or a custom filter ID. + * ``compression.parameters.level``: Integer compression level (method‑dependent). + * ``compression.prefilters``: e.g. ``["shuffle"]``. +3. Add any dataset‑specific overrides. Key format examples: + + * ``"ElectricalSeries/data"`` – targets the ``data`` dataset inside any ``ElectricalSeries``. + * ``"ProcessingModule_TimeIntervals_start_time"`` (illustrative) – keys are matched to MATLAB property / spec paths (see comments below). + +4. Load it: + +.. code-block:: matlab + + cfg = io.config.readDatasetConfiguration("myprofile"); + io.config.applyDatasetConfiguration(nwb, cfg); + +Dataset override resolution +--------------------------- +The resolver looks for the most specific key that matches the dataset’s path/type; if no specific key matches, it falls back to ``Default``. You can safely omit fields you don’t change in an override; only provided subfields (e.g. updating ``chunking.strategy_by_rank``) are merged. + +Edge cases & tips +----------------- +* Small datasets: If the whole dataset fits within the target chunk size threshold, no ``DataPipe`` is created (stored contiguous by default); this avoids unnecessary chunking overhead. +* Non‑numeric datasets: Currently ignored by the automatic wrapper (e.g. ragged arrays, DataStubs, Sets). You can still wrap them manually. +* Reading existing NWB (``nwbRead``): Re‑chunking or re‑compressing existing datasets into a new output file is planned but not yet implemented for ``DataStub`` sources. +* Vectors: Are represented as true 1‑D in HDF5 (MatNWB sets ``maxSize = Inf`` to maintain extendability / column layout). +* Warnings: If actual computed chunk size bytes exceed the requested target, a warning is raised – adjust strategy or target size. + +Verifying the applied configuration +---------------------------------- +After export, you can inspect chunking and compression with ``h5info``: + +.. code-block:: matlab + + info = h5info('example_cloud_profile.nwb', '/acquisition/example_eSeries/data'); + info.ChunkSize % should reflect computed chunkSize + info.Filters % lists compression + shuffle if present + +Troubleshooting +--------------- +* ``No matching rank strategy`` error: Add a list for that rank (e.g. key ``"5"``) in ``strategy_by_rank``. +* ``TargetSizeExceeded`` warning: Reduce dimensions marked ``max`` or lower numeric bounds; lower ``target_chunk_size``. +* ``Unsupported target_chunk_size_unit``: Ensure unit is one of ``bytes``, ``kiB``, ``MiB``, ``GiB``. + +Next steps +---------- +* Combine with streaming writes using ``DataPipe.append`` for very large, incremental acquisitions. +* Profile read performance with different chunk strategies to tune domain‑specific workloads. + +Summary +------- +You load a profile JSON, apply it, and export. MatNWB computes chunk sizes from simple declarative rules (``flex`` / ``max`` / numeric) and attaches compression filters. This yields consistent, reproducible storage characteristics across NWB files without hand‑tuning each dataset. + +See also +-------- +* :func:`io.config.readDatasetConfiguration` +* :func:`io.config.applyDatasetConfiguration` +* :func:`nwbExport` +* HDF5 chunking & compression guidelines (HDF Group docs) + From 4c2ff13fb85a8e237daa28bba8e3af1063178f9c Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 29 Sep 2025 16:21:08 +0200 Subject: [PATCH 40/67] Update performance_optimization.rst Fix formatting --- .../file_create/performance_optimization.rst | 20 +++++++++---------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/docs/source/pages/concepts/file_create/performance_optimization.rst b/docs/source/pages/concepts/file_create/performance_optimization.rst index 5635e2d94..cb6262a9d 100644 --- a/docs/source/pages/concepts/file_create/performance_optimization.rst +++ b/docs/source/pages/concepts/file_create/performance_optimization.rst @@ -2,15 +2,13 @@ Storage optimization ==================== -Neuroscience data can be very large, and compression helps reduce file size, improving both storage efficiency and data transfer speed. +Neuroscience data can be very large, and compression helps reduce file size, improving both storage efficiency and data transfer time. Compression ----------- -MatNWB supports HDF5 compression filters via the :class:`types.untyped.DataPipe` class. The default filter is GZIP (also known as DEFLATE), which is widely supported and provides a good balance of compression ratio and speed. Custom or dynamically loaded filters (e.g., BLOSC, LZ4) can be configured when the underlying HDF5 build supports them—see the :doc:`dynamically loaded filters ` tutorial. These can offer better performance for specific data types or access patterns, however they may not be as widely supported as GZIP. - -In other words, if you use a custom filter, ensure that any software or collaborators accessing the file can support that filter. +MatNWB supports HDF5 compression filters via the :class:`types.untyped.DataPipe` class. The default filter is GZIP (also known as DEFLATE), which is widely supported and provides a good balance of compression ratio and speed. Custom or dynamically loaded filters (e.g., BLOSC, LZ4) can be configured when the underlying HDF5 build supports them—see the :doc:`dynamically loaded filters ` tutorial. These can offer better performance for specific data types or access patterns, however they may not be as widely supported as GZIP. In other words, if you use a custom filter, ensure that any software or collaborators accessing the file can support that filter. For step-by-step usage, see the :doc:`DataPipe tutorial ` and the dynamically loaded filter example (:doc:`dynamically loaded filters `). @@ -25,16 +23,16 @@ For example, if you frequently read time series data in segments (e.g., 1-second Further, the chunk size can impact compression efficiency. Larger chunks may yield better compression ratios, but can also increase memory usage during read/write operations. Conversely, smaller chunks may reduce memory overhead but could lead to less effective compression. For archival purposes, larger chunks are often preferred to maximize compression, while for interactive analysis, smaller chunks may be more suitable to optimize access speed. For online/cloud access, chunk sizes in the range of 2MB to 10MB are often recommended, but this can vary based on specific use cases and data characteristics. -MatNWB Configuration Profiles +MatNWB configuration profiles ----------------------------- MatNWB provides predefined configuration profiles that set sensible defaults for chunking and compression based on common use cases. These profiles can be specified when creating an NWB file, allowing users to optimize their files for local storage, cloud storage, or archiving without needing to manually configure each parameter. Profile comparison: ~~~~~~~~~~~~~~~~~~~ -* ``default``: Balanced; small (1 MB) target chunks, gzip level 3. -* ``cloud``: Slightly larger chunks (10 MB) + shuffle for better remote object store streaming; dataset‑specific override for ``ElectricalSeries/data`` to bound one dimension (e.g. 64 samples per chunk row) aiding partial reads. -* ``archive``: Large target (100 MB) to improve compression ratio, Zstandard level 5 (faster decompression than high‑level gzip for similar ratios). Good for cold storage. +* **default**: Balanced; small (1 MB) target chunks, gzip level 3. +* **cloud**: Slightly larger chunks (10 MB) + shuffle for better remote object store streaming; dataset‑specific override for ``ElectricalSeries/data`` to bound one dimension (e.g. 64 samples per chunk row) aiding partial reads. +* **archive**: Large target (100 MB) to improve compression ratio, Zstandard level 5 (faster decompression than high‑level gzip for similar ratios). Good for cold storage. See the :doc:`compression profiles ` guide for details on using these profiles, as well as how to create custom configurations tailored to your specific needs. @@ -42,8 +40,8 @@ See the :doc:`compression profiles ` (practical usage patterns) -- Tutorial: :doc:`dynamically loaded filters ` (advanced compression filters) +- Tutorial: :ref:`DataPipe ` (practical usage patterns) +- Tutorial: :ref:`dynamically loaded filters ` (advanced compression filters) - API: :class:`types.untyped.DataPipe` External references @@ -51,4 +49,4 @@ External references - HDF5 background: `Chunking `_ & `Compression `_ - Cloud-optimized NetCDF4/HDF5: `Guide `_ -- Cloud-optimized HDF5: `Presentation ` \ No newline at end of file +- Cloud-optimized HDF5: `Presentation `_ \ No newline at end of file From b73e8d4f4f3845941a5d2c8b61014752d03d6820 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 29 Sep 2025 16:36:37 +0200 Subject: [PATCH 41/67] Update compression_profiles.rst --- .../compression/compression_profiles.rst | 20 ++----------------- 1 file changed, 2 insertions(+), 18 deletions(-) diff --git a/docs/source/pages/how_to/compression/compression_profiles.rst b/docs/source/pages/how_to/compression/compression_profiles.rst index baa87792a..9992f2cf2 100644 --- a/docs/source/pages/how_to/compression/compression_profiles.rst +++ b/docs/source/pages/how_to/compression/compression_profiles.rst @@ -6,7 +6,7 @@ How to apply compression & chunking profiles when writing NWB files This how-to shows you, step by step, how to apply a predefined (or custom) dataset configuration profile (chunking + compression) to the datasets in an ``NwbFile`` before exporting with :func:`nwbExport`. It focuses on the practical steps – *what to do* – -and assumes you already know in general why chunking and compression matter (see the +and assumes you already know in general why chunking and compression matter (see :doc:`performance optimization `). .. contents:: On this page @@ -83,14 +83,6 @@ What happens under the hood? - Optional ``prefilters`` like ``shuffle`` improve compression on integer / low‑entropy columns. * Replaces the raw numeric array with a ``types.untyped.DataPipe`` configured with ``chunkSize``, compression filters, and (for vectors) a columnar representation (``maxSize = Inf`` ensures 1‑D write layout). -Selecting a profile -------------------- -Profile comparison (conceptual): - -* ``default``: Balanced; small (1 MB) target chunks, gzip level 3. -* ``cloud``: Slightly larger chunks (10 MB) + shuffle for better remote object store streaming; dataset‑specific override for ``ElectricalSeries/data`` to bound one dimension (e.g. 64 samples per chunk row) aiding partial reads. -* ``archive``: Large target (100 MB) to improve compression ratio, Zstandard level 5 (faster decompression than high‑level gzip for similar ratios). Good for cold storage. - Overriding an existing DataPipe ------------------------------- If you already created a ``DataPipe`` manually (or ran a profile once) and want to re‑apply with a different profile: @@ -133,7 +125,7 @@ Edge cases & tips ----------------- * Small datasets: If the whole dataset fits within the target chunk size threshold, no ``DataPipe`` is created (stored contiguous by default); this avoids unnecessary chunking overhead. * Non‑numeric datasets: Currently ignored by the automatic wrapper (e.g. ragged arrays, DataStubs, Sets). You can still wrap them manually. -* Reading existing NWB (``nwbRead``): Re‑chunking or re‑compressing existing datasets into a new output file is planned but not yet implemented for ``DataStub`` sources. +* Reading existing NWB (``nwbRead``): Re‑chunking or re‑compressing existing datasets into a new output file is not implemented for ``DataStub`` sources. * Vectors: Are represented as true 1‑D in HDF5 (MatNWB sets ``maxSize = Inf`` to maintain extendability / column layout). * Warnings: If actual computed chunk size bytes exceed the requested target, a warning is raised – adjust strategy or target size. @@ -161,11 +153,3 @@ Next steps Summary ------- You load a profile JSON, apply it, and export. MatNWB computes chunk sizes from simple declarative rules (``flex`` / ``max`` / numeric) and attaches compression filters. This yields consistent, reproducible storage characteristics across NWB files without hand‑tuning each dataset. - -See also --------- -* :func:`io.config.readDatasetConfiguration` -* :func:`io.config.applyDatasetConfiguration` -* :func:`nwbExport` -* HDF5 chunking & compression guidelines (HDF Group docs) - From 8d15987b9b0c6b5ff6199ac0dacde2073db71c4b Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 29 Sep 2025 17:46:23 +0200 Subject: [PATCH 42/67] Simplify config-profile how-to guide, add to main index --- .../compression/compression_profiles.rst | 119 +++++++----------- docs/source/pages/how_to/index.rst | 4 + 2 files changed, 51 insertions(+), 72 deletions(-) diff --git a/docs/source/pages/how_to/compression/compression_profiles.rst b/docs/source/pages/how_to/compression/compression_profiles.rst index 9992f2cf2..bf3df9086 100644 --- a/docs/source/pages/how_to/compression/compression_profiles.rst +++ b/docs/source/pages/how_to/compression/compression_profiles.rst @@ -1,13 +1,14 @@ .. _howto-compression-profiles: -How to apply compression & chunking profiles when writing NWB files -=================================================================== +Use compression profiles +======================== -This how-to shows you, step by step, how to apply a predefined (or custom) dataset -configuration profile (chunking + compression) to the datasets in an ``NwbFile`` -before exporting with :func:`nwbExport`. It focuses on the practical steps – *what to do* – -and assumes you already know in general why chunking and compression matter (see -:doc:`performance optimization `). +How to optimize storage in NWB files using predefined or custom dataset configuration profiles for compression and chunking. + +Prerequisites +------------- +* MatNWB installed and on the MATLAB path. +* Basic familiarity with creating NWB objects (see the MatNWB tutorials if needed). .. contents:: On this page :local: @@ -20,42 +21,20 @@ At a glance 3. Apply it with :func:`io.config.applyDatasetConfiguration`. 4. Export. -When to use this ----------------- -Use this whenever you have medium/large numeric datasets and you want: - -* Reasonable default gzip (deflate) compression and adaptive chunk sizes (``default``). -* Cloud‑optimized access patterns (smaller per-chunk footprint + shuffle) (``cloud``). -* Higher compression ratio for long‑term storage (larger chunk targets + Zstandard) (``archive``). - -Prerequisites -------------- -* MatNWB installed and on the MATLAB path. -* Basic familiarity with creating NWB objects (see the MatNWB tutorials if needed). - -Key functions & files ---------------------- -* ``io.config.readDatasetConfiguration(profile)`` – loads JSON from ``configuration/*_dataset_configuration.json``. -* ``io.config.applyDatasetConfiguration(nwb, config, "OverrideExisting", false)`` – wraps qualifying numeric arrays in ``types.untyped.DataPipe`` with computed ``chunkSize`` and compression filters. -* Configuration JSON examples (shipped): - - - ``configuration/default_dataset_configuration.json`` - - ``configuration/cloud_dataset_configuration.json`` - - ``configuration/archive_dataset_configuration.json`` -Quick start example -------------------- +Creating and exporting an NWB file with a profile +------------------------------------------------- .. code-block:: matlab % 1. Create and populate an NWB file nwb = NwbFile(); % (set identifiers, session start time, etc.) - data = rand(1e6,1,'single'); % Example large vector + data = rand(1e6, 1, 'single'); % Example large vector es = types.core.ElectricalSeries(... 'data', data, ... 'data_unit', 'volts', ... 'starting_time', 0, ... 'starting_time_rate', 30000); - nwb.acquisition.set('example_eSeries', es); + nwb.acquisition.set('ExampleSeries', es); % 2. Load a profile (choose "default", "cloud", or "archive") cfg = io.config.readDatasetConfiguration("cloud"); @@ -66,22 +45,6 @@ Quick start example % 4. Export nwbExport(nwb, 'example_cloud_profile.nwb'); -What happens under the hood? ----------------------------- -``applyDatasetConfiguration`` walks every neurodata object in the file tree and, for each numeric dataset: - -* Resolves the most specific matching entry in the configuration (dataset‑level override beats ``Default``). -* Computes a target ``chunkSize`` given: - - ``chunking.target_chunk_size`` + ``target_chunk_size_unit`` (e.g. 1,000,000 bytes) - - ``chunking.strategy_by_rank`` list for the dataset’s rank (e.g. ["flex", "max"]). - * ``flex`` → dimension is sized so total bytes per chunk ≈ target. - * ``max`` → take full length of that dimension. - * Numeric value → upper bound (capped by actual size). -* Chooses compression: - - ``method = deflate`` (gzip) → uses ``compressionLevel`` (default 3 if absent). - - Other methods (e.g. ``ZStandard``) → inserted as a custom filter. - - Optional ``prefilters`` like ``shuffle`` improve compression on integer / low‑entropy columns. -* Replaces the raw numeric array with a ``types.untyped.DataPipe`` configured with ``chunkSize``, compression filters, and (for vectors) a columnar representation (``maxSize = Inf`` ensures 1‑D write layout). Overriding an existing DataPipe ------------------------------- @@ -94,21 +57,40 @@ If you already created a ``DataPipe`` manually (or ran a profile once) and want Customizing a profile --------------------- + 1. Copy one of the shipped JSON files (e.g. ``default_dataset_configuration.json``) to a new file in ``configuration/`` (e.g. ``myprofile_dataset_configuration.json``). + 2. Adjust fields: - * ``chunking.target_chunk_size`` / ``_unit``: Overall chunk byte target. - * ``chunking.strategy_by_rank``: For each rank (key is the number of dimensions). Each list position corresponds to a dimension (slowest → fastest in MATLAB order). Use: - - ``"flex"`` - - ``"max"`` - - an integer (upper bound) - * ``compression.method``: ``deflate`` (gzip), ``ZStandard`` (if filter available), or a custom filter ID. - * ``compression.parameters.level``: Integer compression level (method‑dependent). - * ``compression.prefilters``: e.g. ``["shuffle"]``. -3. Add any dataset‑specific overrides. Key format examples: + ``chunking.target_chunk_size`` + Overall byte target size for each chunk. + + ``chunking.strategy_by_rank`` + Strategy per dataset rank (key = number of dimensions). + Each list element corresponds to a dimension axis. + Possible values: + + - ``"flex"`` + - ``"max"`` + - *integer* (upper bound) + + ``compression.method`` + Compression algorithm: ``deflate`` (gzip), ``ZStandard`` (if available), or a custom filter ID. + + ``compression.parameters.level`` + Integer compression level (method-dependent). + + ``compression.prefilters`` + Optional prefilters, e.g. ``["shuffle"]``. + +3. Add any neurodata type/dataset-specific overrides. Key format examples: + + ``"ElectricalSeries/data"`` + Targets the ``data`` dataset inside any ``ElectricalSeries``. + + ``"TwoPhotonSeries/data"`` *(illustrative)* + Keys are matched to MATLAB property / spec paths. - * ``"ElectricalSeries/data"`` – targets the ``data`` dataset inside any ``ElectricalSeries``. - * ``"ProcessingModule_TimeIntervals_start_time"`` (illustrative) – keys are matched to MATLAB property / spec paths (see comments below). 4. Load it: @@ -117,17 +99,6 @@ Customizing a profile cfg = io.config.readDatasetConfiguration("myprofile"); io.config.applyDatasetConfiguration(nwb, cfg); -Dataset override resolution ---------------------------- -The resolver looks for the most specific key that matches the dataset’s path/type; if no specific key matches, it falls back to ``Default``. You can safely omit fields you don’t change in an override; only provided subfields (e.g. updating ``chunking.strategy_by_rank``) are merged. - -Edge cases & tips ------------------ -* Small datasets: If the whole dataset fits within the target chunk size threshold, no ``DataPipe`` is created (stored contiguous by default); this avoids unnecessary chunking overhead. -* Non‑numeric datasets: Currently ignored by the automatic wrapper (e.g. ragged arrays, DataStubs, Sets). You can still wrap them manually. -* Reading existing NWB (``nwbRead``): Re‑chunking or re‑compressing existing datasets into a new output file is not implemented for ``DataStub`` sources. -* Vectors: Are represented as true 1‑D in HDF5 (MatNWB sets ``maxSize = Inf`` to maintain extendability / column layout). -* Warnings: If actual computed chunk size bytes exceed the requested target, a warning is raised – adjust strategy or target size. Verifying the applied configuration ---------------------------------- @@ -135,7 +106,7 @@ After export, you can inspect chunking and compression with ``h5info``: .. code-block:: matlab - info = h5info('example_cloud_profile.nwb', '/acquisition/example_eSeries/data'); + info = h5info('example_cloud_profile.nwb', '/acquisition/ExampleSeries/data'); info.ChunkSize % should reflect computed chunkSize info.Filters % lists compression + shuffle if present @@ -153,3 +124,7 @@ Next steps Summary ------- You load a profile JSON, apply it, and export. MatNWB computes chunk sizes from simple declarative rules (``flex`` / ``max`` / numeric) and attaches compression filters. This yields consistent, reproducible storage characteristics across NWB files without hand‑tuning each dataset. + + +See also: +:ref:`Storage optimization `. diff --git a/docs/source/pages/how_to/index.rst b/docs/source/pages/how_to/index.rst index 2fb97c024..09fcc8353 100644 --- a/docs/source/pages/how_to/index.rst +++ b/docs/source/pages/how_to/index.rst @@ -5,3 +5,7 @@ Use Extensions using_extensions/generating_extension_api using_extensions/installing_extensions + +Optimize Storage +================ + compression/compression_profiles From d19e13792cdae21feb0160389341b86453a8062e Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 29 Sep 2025 17:54:55 +0200 Subject: [PATCH 43/67] Update compression_profiles.rst --- .../pages/how_to/compression/compression_profiles.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/source/pages/how_to/compression/compression_profiles.rst b/docs/source/pages/how_to/compression/compression_profiles.rst index bf3df9086..a76fea792 100644 --- a/docs/source/pages/how_to/compression/compression_profiles.rst +++ b/docs/source/pages/how_to/compression/compression_profiles.rst @@ -88,8 +88,8 @@ Customizing a profile ``"ElectricalSeries/data"`` Targets the ``data`` dataset inside any ``ElectricalSeries``. - ``"TwoPhotonSeries/data"`` *(illustrative)* - Keys are matched to MATLAB property / spec paths. + ``"TwoPhotonSeries/data"`` + Targets the ``data`` dataset inside any ``TwoPhotonSeries``. 4. Load it: @@ -127,4 +127,5 @@ You load a profile JSON, apply it, and export. MatNWB computes chunk sizes from See also: -:ref:`Storage optimization `. +--------- +:ref:`Storage optimization `. From 67b566e81ce95ab621237f6f60961dc821dae8f1 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 29 Sep 2025 17:55:02 +0200 Subject: [PATCH 44/67] Update neurodata_types.rst --- .../concepts/file_create/neurodata_types.rst | 123 ++++++++---------- 1 file changed, 51 insertions(+), 72 deletions(-) diff --git a/docs/source/pages/concepts/file_create/neurodata_types.rst b/docs/source/pages/concepts/file_create/neurodata_types.rst index 6b4b5e62f..073b92f07 100644 --- a/docs/source/pages/concepts/file_create/neurodata_types.rst +++ b/docs/source/pages/concepts/file_create/neurodata_types.rst @@ -1,115 +1,94 @@ Understanding MatNWB Neurodata Types ==================================== -MatNWB neurodata types are specialized MATLAB classes that represent different kinds of neuroscience data. These types provide structured containers that hold your data along with the metadata and organizational information needed to interpret it correctly. +MatNWB neurodata types are MATLAB classes designed to represent different kinds of neuroscience data in a structured and interoperable way. They combine data, metadata, and contextual information, enabling consistent organization, interpretation, and sharing across tools and research environments. -Why Specialized Types Instead of Standard Data Types? ------------------------------------------------------ +Why Use Specialized Neurodata Types? +------------------------------------ -MatNWB's neurodata types have several advantages compared to using generic MATLAB arrays or structs for storing data: +Standard MATLAB data structures like arrays or structs are flexible but lack domain-specific constraints. MatNWB types provide additional structure and semantics that are essential for reliable data handling in neuroscience: -**They encode domain knowledge**: Each type includes the specific requirements for neuroscience data. A :class:`types.core.ElectricalSeries` requires electrode information, sampling rates, and data units - enforcing these requirements automatically rather than relying on you to remember them. +- **Domain-specific structure**: Each type encodes the metadata and relationships required for a particular data modality. For example, a :class:`types.core.ElectricalSeries` requires electrode metadata, sampling information, and data units. +- **Built-in validation**: Types enforce the presence of essential information, reducing the likelihood of common errors. For instance, :class:`types.core.TwoPhotonSeries` cannot be created without specifying the imaging plane. +- **Interoperability**: Data stored using these types is compatible with NWB-compliant tools and workflows, making it easier to share and reuse across different software ecosystems. -**They prevent common mistakes**: The types guide you toward correct data organization. For example, you cannot store imaging data without specifying the imaging plane when using :class:`types.core.TwoPhotonSeries`. +The Central Concept: TimeSeries +------------------------------- -**They ensure compatibility**: Data stored in these types will work with other NWB tools and can be shared with collaborators who use different analysis software. +Many experimental signals in neuroscience change over time. The :class:`types.core.TimeSeries` type provides a standardized structure for representing these signals together with their temporal context. -The Foundation: TimeSeries ---------------------------- +A TimeSeries object combines: -Most neuroscience data varies over time, so MatNWB builds around a fundamental concept: :class:`types.core.TimeSeries`. This isn't just a MATLAB array with timestamps - it's a structured way to represent any measurement that changes over time. +- **Data and meaning**: The recorded measurements alongside descriptions of what they represent. +- **Temporal information**: Flexible handling of regular or irregular sampling, timestamps, and time references. +- **Metadata**: Units, descriptions, and experiment-specific context stored together with the data. +- **Relationships**: References to other objects, such as stimulus definitions or behavioral events. -**What TimeSeries provides:** +Use a basic TimeSeries when the data varies over time but does not require the additional structure of a specialized type. Examples include custom behavioral metrics, environmental sensor data, or novel measurement modalities. -- **Data with context**: Your measurements plus information about what they represent -- **Time handling**: Flexible ways to represent regular or irregular sampling -- **Metadata storage**: Data units, descriptions, and experimental details stay attached to the data -- **Relationship tracking**: Connections to other parts of your experiment +Specialized TimeSeries Variants +------------------------------- -**When to use basic TimeSeries**: For any time-varying measurement that doesn't fit a more specific type - like custom behavioral metrics, environmental sensors, or novel measurement techniques. +MatNWB builds on the TimeSeries concept with specialized types tailored to common experimental data. These types capture modality-specific metadata, constraints, and relationships. -Specialized TimeSeries Types ----------------------------- +**ElectricalSeries: Electrical Recordings** -MatNWB provides specialized versions of TimeSeries for common neuroscience data types. These aren't just conveniences - they capture the specific requirements and relationships of different experimental approaches. +Electrophysiological recordings require metadata about the electrodes, their positions, and acquisition parameters. :class:`types.core.ElectricalSeries` links time-varying voltage data with this contextual information, allowing downstream tools to interpret the signals accurately. -**ElectricalSeries: For Electrical Recordings** +**TwoPhotonSeries and OnePhotonSeries: Optical Recordings** -Understanding electrical recordings requires knowing which electrodes recorded the data, their locations, and recording parameters. :class:`types.core.ElectricalSeries` handles these relationships automatically. +Optical recordings, such as calcium imaging, differ fundamentally from electrical recordings. These types include metadata about imaging planes, indicators, and acquisition parameters (e.g., excitation wavelength), reflecting the experimental conditions required to interpret fluorescence-based signals. -The key insight: electrical data isn't just voltages over time - it's voltages from specific spatial locations in the brain, recorded with particular methods and settings. +**SpatialSeries: Positional and Movement Data** -**TwoPhotonSeries and OnePhotonSeries: For Optical Data** - -Calcium imaging data has fundamentally different characteristics than electrical recordings. These types understand that optical data comes from specific imaging planes, uses particular indicators, and has unique technical parameters like excitation wavelengths. - -The key insight: optical data represents neural activity indirectly through fluorescence changes, requiring different metadata and processing considerations. - -**SpatialSeries: For Position and Movement** - -Behavioral tracking data represents the subject's position or movement through space. :class:`types.core.SpatialSeries` understands spatial coordinates, reference frames, and the relationship between position and time. - -The key insight: spatial data requires coordinate system information to be meaningful - the same X,Y coordinates mean different things in different reference frames. +Behavioral tracking data records spatial coordinates over time. :class:`types.core.SpatialSeries` includes information about reference frames, coordinate systems, and spatial dimensions, which are necessary for interpreting positional measurements correctly. Container Types: Organizing Related Data ---------------------------------------- -Some neurodata types don't hold data directly - they organize other types into meaningful groups. - -**ProcessingModule: Grouping Related Analyses** - -Experiments often involve multiple processing steps that belong together. :class:`types.core.ProcessingModule` lets you group related processed data, maintaining the logical flow of your analysis pipeline. +Some MatNWB types act as containers for other data objects, structuring them into logical groupings. -The key insight: processed data gains meaning through its relationship to the raw data and processing steps that created it. +**ProcessingModule: Analysis Grouping** -**Position, CompassDirection, BehavioralEvents: Behavioral Organization** +Experiments often produce multiple derived datasets from different processing steps. :class:`types.core.ProcessingModule` groups these results, preserving their relationships to raw data and to each other within an analysis workflow. -These container types organize different aspects of behavioral data. Rather than scattering behavioral measurements throughout your file, they provide structured locations that other researchers will recognize. - -The key insight: behavioral experiments often involve multiple simultaneous measurements that need to be understood as a coordinated whole. +**Behavioral Containers: Position, CompassDirection, BehavioralEvents** +Behavioral experiments frequently generate multiple types of measurements. Container types provide a consistent organizational structure for these related datasets, making it easier for collaborators and tools to understand their relationships. Table-Based Types: Structured Metadata -------------------------------------- -Some experimental information is naturally tabular rather than time-series based. - -**Units Table: Spike Data Organization** - -Sorted spike data doesn't fit well into TimeSeries because each unit has different spike times. The :class:`types.core.Units` table provides a structured way to store spike times, waveforms, and unit metadata together. - -The key insight: spike sorting creates discrete events (spikes) rather than continuous measurements, requiring different organizational principles. - -**Electrode Tables: Recording Site Information** - -Information about recording electrodes (location, impedance, brain region) is relatively static but essential for interpreting electrical data. Electrode tables store this information once and allow multiple data types to reference it. - -The key insight: experimental metadata often has different temporal characteristics than the data itself - electrode properties don't change during recording, but voltage measurements do. +Not all experimental information is time-series based. Some metadata is better represented in tabular form, particularly when it describes static properties or discrete events. +**Units Table: Discrete Spike Data** -How MatNWB Types Work in Practice ---------------------------------- +Sorted spike data consists of discrete events (spikes) that occur at variable times. The :class:`types.core.Units` table organizes spike times, waveforms, and unit metadata in a structured and queryable way. -- **Object-Oriented Organization**: Each neurodata type is a MATLAB class with specific properties. When you create an object, MATLAB ensures you provide the required information and validates the data types. +**Electrode Tables: Recording Site Metadata** -- **Automatic Relationships**: Types understand their relationships to other types. When you reference an electrode table from an ElectricalSeries, MatNWB maintains that connection automatically. +Metadata describing recording sites—such as electrode position, impedance, and brain region—is typically static during an experiment. Electrode tables store this information once and allow it to be referenced by multiple data types. -- **Flexible Extension**: While types have required properties, you can add additional information as needed. This lets you capture experiment-specific details while maintaining compatibility. +Working with MatNWB Types +------------------------- -- **Validation and Error Prevention**: Types catch common errors before they become problems. Missing required properties, incorrect data shapes, or type mismatches generate helpful error messages. +MatNWB neurodata types use object-oriented design principles to integrate structure, validation, and relationships directly into the data model: -Choosing the Right Type ------------------------ +- **Object properties**: Each type defines a fixed set of properties, ensuring required metadata is always present and validated when objects are created. +- **Automatic linking**: References between related objects (e.g., an ElectricalSeries referencing an electrode table) are handled automatically. +- **Extensibility**: While core properties are fixed, additional metadata can be attached as needed to capture experiment-specific details. +- **Error prevention**: Structural validation reduces errors by detecting missing information, type mismatches, or inconsistent shapes early. -The goal isn't to memorize every available type, but to understand the principle: **match your data to the type that best represents its experimental meaning**. +Selecting the Appropriate Type +------------------------------ -**Ask yourself:** +Choosing the right type depends on the nature of the data and how it fits into the broader experimental context. Consider the following questions: -- What kind of measurement is this? (electrical, optical, behavioral, etc.) -- How does it relate to other parts of my experiment? -- What contextual information is needed to interpret it? -- Would another researcher understand this data organization? +- What is being measured? (e.g., electrical activity, fluorescence, position) +- How is it related to other parts of the experiment? +- What metadata is required to interpret the measurement? +- Would another researcher understand the data structure without additional explanation? -**Start simple**: When in doubt, basic TimeSeries can represent any time-varying data. You can always use more specific types as you become familiar with them. +A practical approach is to begin with the general :class:`types.core.TimeSeries` for any time-varying data. As familiarity increases, adopt more specialized types that better capture the semantics and constraints of specific experimental modalities. -**Follow the data flow**: Raw measurements go in acquisition, processed results go in processing modules, final analyses go in analysis. This mirrors your experimental workflow. +Organize data to reflect the experimental workflow: raw measurements in acquisition, processed results in processing modules, and analysis outputs in analysis groups. This structure aligns the data model with the scientific process and supports reproducibility and interoperability. From c0a30fe4f3b41897b7ebc114d6be13ddf80a56bc Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 29 Sep 2025 18:02:00 +0200 Subject: [PATCH 45/67] Update compression_profiles.rst --- .../pages/how_to/compression/compression_profiles.rst | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/source/pages/how_to/compression/compression_profiles.rst b/docs/source/pages/how_to/compression/compression_profiles.rst index a76fea792..b7451c558 100644 --- a/docs/source/pages/how_to/compression/compression_profiles.rst +++ b/docs/source/pages/how_to/compression/compression_profiles.rst @@ -27,9 +27,12 @@ Creating and exporting an NWB file with a profile .. code-block:: matlab % 1. Create and populate an NWB file - nwb = NwbFile(); % (set identifiers, session start time, etc.) - data = rand(1e6, 1, 'single'); % Example large vector - es = types.core.ElectricalSeries(... + nwb = NwbFile( ... + 'identifier', 'compression-howto-20250411T153000Z', ... + 'session_description', 'Compression profile how-to guide', ... + 'session_start_time', datetime(2025,4,11,15,30,0,'TimeZone','UTC')); + data = rand(32, 1e6, 'single'); % Example large matrix + es = types.core.TimeSeries(... 'data', data, ... 'data_unit', 'volts', ... 'starting_time', 0, ... From af42d1c25f831012b695e7d199b50530c42c5cdd Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 29 Sep 2025 18:06:37 +0200 Subject: [PATCH 46/67] Update neurodata_types.rst Added section on time intervals --- docs/source/pages/concepts/file_create/neurodata_types.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/source/pages/concepts/file_create/neurodata_types.rst b/docs/source/pages/concepts/file_create/neurodata_types.rst index 073b92f07..4465f96af 100644 --- a/docs/source/pages/concepts/file_create/neurodata_types.rst +++ b/docs/source/pages/concepts/file_create/neurodata_types.rst @@ -69,6 +69,10 @@ Sorted spike data consists of discrete events (spikes) that occur at variable ti Metadata describing recording sites—such as electrode position, impedance, and brain region—is typically static during an experiment. Electrode tables store this information once and allow it to be referenced by multiple data types. +**Trials Table: Time-Indexed Experimental Structure** + +Many experiments are organized into discrete trials or epochs. The NWB ``trials`` table (a :class:`types.core.TimeIntervals` object) captures these segments using required ``start_time`` and ``stop_time`` columns and any number of user-defined per-trial metadata columns (e.g., stimulus identity, condition, response correctness). + Working with MatNWB Types ------------------------- From 0e7148574750eb65d904b2395ae5e86b81bf2a86 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 29 Sep 2025 18:07:37 +0200 Subject: [PATCH 47/67] Update index.rst --- docs/source/pages/how_to/index.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/source/pages/how_to/index.rst b/docs/source/pages/how_to/index.rst index 09fcc8353..914c61a16 100644 --- a/docs/source/pages/how_to/index.rst +++ b/docs/source/pages/how_to/index.rst @@ -8,4 +8,7 @@ Use Extensions Optimize Storage ================ +.. toctree:: + :maxdepth: 1 + compression/compression_profiles From a1c738a622126eda1ce5b9c0567974f1d797ba21 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 29 Sep 2025 18:10:57 +0200 Subject: [PATCH 48/67] Rename performance_optimization to storage_optimization --- docs/source/pages/concepts/file_create.rst | 2 +- ...{performance_optimization.rst => storage_optimization.rst} | 4 ++-- docs/source/pages/how_to/compression/compression_profiles.rst | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) rename docs/source/pages/concepts/file_create/{performance_optimization.rst => storage_optimization.rst} (95%) diff --git a/docs/source/pages/concepts/file_create.rst b/docs/source/pages/concepts/file_create.rst index 150106eab..31f896da9 100644 --- a/docs/source/pages/concepts/file_create.rst +++ b/docs/source/pages/concepts/file_create.rst @@ -42,4 +42,4 @@ If anything is missing or incorrect, you'll get an error message explaining what Understanding Neurodata Types Storage Backends Editing NWB Files - Performance Optimization + Performance Optimization diff --git a/docs/source/pages/concepts/file_create/performance_optimization.rst b/docs/source/pages/concepts/file_create/storage_optimization.rst similarity index 95% rename from docs/source/pages/concepts/file_create/performance_optimization.rst rename to docs/source/pages/concepts/file_create/storage_optimization.rst index cb6262a9d..751dc26bf 100644 --- a/docs/source/pages/concepts/file_create/performance_optimization.rst +++ b/docs/source/pages/concepts/file_create/storage_optimization.rst @@ -40,8 +40,8 @@ See the :doc:`compression profiles ` (practical usage patterns) -- Tutorial: :ref:`dynamically loaded filters ` (advanced compression filters) +- Tutorial: :doc:`DataPipe ` (practical usage patterns) +- Tutorial: :doc:`dynamically loaded filters ` (advanced compression filters) - API: :class:`types.untyped.DataPipe` External references diff --git a/docs/source/pages/how_to/compression/compression_profiles.rst b/docs/source/pages/how_to/compression/compression_profiles.rst index b7451c558..2848761ed 100644 --- a/docs/source/pages/how_to/compression/compression_profiles.rst +++ b/docs/source/pages/how_to/compression/compression_profiles.rst @@ -131,4 +131,4 @@ You load a profile JSON, apply it, and export. MatNWB computes chunk sizes from See also: --------- -:ref:`Storage optimization `. +:ref:`Storage optimization `. From b57e0490f557d2d72abe0a38b5c43b88f1744b14 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Mon, 29 Sep 2025 18:12:13 +0200 Subject: [PATCH 49/67] Update compression_profiles.rst Fixed reference --- docs/source/pages/how_to/compression/compression_profiles.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/pages/how_to/compression/compression_profiles.rst b/docs/source/pages/how_to/compression/compression_profiles.rst index 2848761ed..70d7953b9 100644 --- a/docs/source/pages/how_to/compression/compression_profiles.rst +++ b/docs/source/pages/how_to/compression/compression_profiles.rst @@ -131,4 +131,4 @@ You load a profile JSON, apply it, and export. MatNWB computes chunk sizes from See also: --------- -:ref:`Storage optimization `. +:doc:`Storage optimization `. From b0e0cb642f18ada85817da8420fc3f1d5c342d20 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Tue, 7 Oct 2025 21:13:56 +0200 Subject: [PATCH 50/67] Update storage_optimization.rst More clearly explain the rationale for chunking --- .../source/pages/concepts/file_create/storage_optimization.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/source/pages/concepts/file_create/storage_optimization.rst b/docs/source/pages/concepts/file_create/storage_optimization.rst index 751dc26bf..e30e25a7f 100644 --- a/docs/source/pages/concepts/file_create/storage_optimization.rst +++ b/docs/source/pages/concepts/file_create/storage_optimization.rst @@ -2,8 +2,7 @@ Storage optimization ==================== -Neuroscience data can be very large, and compression helps reduce file size, improving both storage efficiency and data transfer time. - +When storing large datasets, much of the information is often redundant — and compression is an obvious way to save space. Neuroscience data, in particular, can be extremely large, and compression reduces file size, saving storage space and reducing data transfer time. However, compressing the entire file as one block creates a new problem: you lose the ability to quickly access specific parts of the data without first decompressing everything. Chunking with compression solves this by dividing the dataset into smaller pieces (“chunks”) and compressing each one individually. This preserves most of the storage benefits of compression while still allowing efficient, random access to the data you need. Compression ----------- From 0045e355c56e0a774c7b385cc064de64d5864d67 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Tue, 7 Oct 2025 21:59:01 +0200 Subject: [PATCH 51/67] Update storage_optimization.rst Add sentence explaining the overhead of HTTP requests and why that favours larger chunk size for cloud optimization --- docs/source/pages/concepts/file_create/storage_optimization.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/pages/concepts/file_create/storage_optimization.rst b/docs/source/pages/concepts/file_create/storage_optimization.rst index e30e25a7f..4cd3b442f 100644 --- a/docs/source/pages/concepts/file_create/storage_optimization.rst +++ b/docs/source/pages/concepts/file_create/storage_optimization.rst @@ -19,7 +19,7 @@ A prerequisite for compression is chunking. Chunking is the partitioning of data For example, if you frequently read time series data in segments (e.g., 1-second windows), chunking along the time axis with a size that matches your typical read length can improve performance. Similarly, for image data, chunking in spatial blocks that align with common access patterns (e.g., tiles or frames) can be beneficial. -Further, the chunk size can impact compression efficiency. Larger chunks may yield better compression ratios, but can also increase memory usage during read/write operations. Conversely, smaller chunks may reduce memory overhead but could lead to less effective compression. For archival purposes, larger chunks are often preferred to maximize compression, while for interactive analysis, smaller chunks may be more suitable to optimize access speed. For online/cloud access, chunk sizes in the range of 2MB to 10MB are often recommended, but this can vary based on specific use cases and data characteristics. +Further, the chunk size can impact compression efficiency. Larger chunks may yield better compression ratios, but can also increase memory usage during read/write operations. Conversely, smaller chunks may reduce memory overhead but could lead to less effective compression. For archival purposes, larger chunks are often preferred to maximize compression, while for interactive analysis, smaller chunks may be more suitable to optimize access speed. For online/cloud access, chunk sizes in the range of 2MB to 10MB are often recommended, as they balance the overhead of multiple HTTP requests with the latency of transferring large chunks. (HTTP requests have significantly higher overhead compared to local file access.) MatNWB configuration profiles From b5588d990653b721dc230bece1ea267af64c219f Mon Sep 17 00:00:00 2001 From: ehennestad Date: Tue, 21 Oct 2025 12:02:31 +0200 Subject: [PATCH 52/67] Improve api for applying dataset configuration profiles to file before or on export (#756) * Add ConfigurationProfile enum and enhance config loading Introduces the ConfigurationProfile enumeration class for dataset configuration profiles. Updates readDatasetConfiguration to use the new enum, adds an options argument for specifying a custom JSON file path, and improves input validation for file paths. * Add resolveDatasetConfiguration utility Introduces resolveDatasetConfiguration.m to handle NWB dataset configuration resolution from file paths, profile names, or direct struct input. Provides input validation and defaults to the standard configuration profile if no input is given. * Add dataset settings configuration to NWB export Introduces methods to apply dataset settings profiles in NwbFile and adds support for dataset configuration options in nwbExport. This enables users to specify dataset settings or profiles (e.g., for cloud or archive storage) prior to exporting NWB files, with an option to override existing settings. * Add tests for dataset settings application in NWB export Introduces unit tests to verify that dataset settings profiles are correctly applied via NWBFile and nwbExport functions, including checks for DataPipe configuration and output file creation when using the 'cloud' profile. * Update NwbFile.m * Update dataset settings argument documentation Clarified and expanded documentation for DatasetSettings and added DatasetSettingsProfile argument in nwbExport.m. Updated usage examples to reflect new argument names and options. * Update compression profile usage documentation Revised instructions to reflect new methods for applying dataset settings, including use of NwbFile.applyDatasetSettings and updated export workflow. Clarified steps for customizing and loading configuration profiles, and improved code examples for better guidance. * Improve docstring for applying dataset config in nwbExport/NwbFile Enhanced documentation for dataset configuration methods in NwbFile and nwbExport, clarifying usage of profiles and custom settings. Renamed argument in NwbFile.applyDatasetSettings for clarity and updated method comments to better describe input options and behavior. * Update compression_profiles.rst Fix indentation, spaces instead of tab * Update compression_profiles.rst * Update NwbFile.m * Add ConfigurationProfile enum to docs * Add documentation for ConfigurationProfile enum Added detailed class-level documentation to ConfigurationProfile describing available dataset configuration profiles and their intended use. Also fixed missing newline at end of matnwb_generateRstFilesFromCode.m. * Add function tests for io.config namespace * Fix negated expression in assertion * Remove default values from method arguments Default values for 'profile' and 'settingsReference' arguments were removed in two methods of NwbFile. This change enforces explicit argument passing and may help prevent unintended behavior due to implicit defaults. * Update ApplyDatasetConfigurationTest.m * Update nwbExport.m * Update nwbExport.m * Update compression_profiles.rst Reorder sections, presenting the simple nwbExport first * Update compression profiles documentation Minor improvements * Update compression_profiles.rst * Update compression_profiles.rst * Update compression_profiles.rst * Clarify chunk size options in compression profiles doc Expanded descriptions for 'flex', 'max', and integer options in the chunk size configuration section to improve clarity for users customizing compression profiles. * Fix advice for TargetSizeExceeded warning in docs Corrects the troubleshooting guidance for the TargetSizeExceeded warning by suggesting to increase target_chunk_size instead of lowering it. --- +io/+config/+enum/ConfigurationProfile.m | 19 +++ +io/+config/readDatasetConfiguration.m | 55 +++++-- +io/+config/resolveDatasetConfiguration.m | 48 +++++++ .../+config/ApplyDatasetConfigurationTest.m | 19 +++ +tests/+unit/+io/+config/FunctionsTest.m | 46 ++++++ +tests/+unit/nwbExportTest.m | 19 +++ NwbFile.m | 91 ++++++++++++ docs/source/pages/functions/index.rst | 1 + .../pages/functions/io/config/enum/index.rst | 11 ++ .../io.config.enum.ConfigurationProfile.rst | 7 + .../pages/functions/io/config/index.rst | 11 ++ docs/source/pages/functions/io/index.rst | 11 ++ .../compression/compression_profiles.rst | 134 ++++++++++++------ nwbExport.m | 47 +++++- .../matnwb_generateRstFilesFromCode.m | 3 +- 15 files changed, 466 insertions(+), 56 deletions(-) create mode 100644 +io/+config/+enum/ConfigurationProfile.m create mode 100644 +io/+config/resolveDatasetConfiguration.m create mode 100644 +tests/+unit/+io/+config/FunctionsTest.m create mode 100644 docs/source/pages/functions/io/config/enum/index.rst create mode 100644 docs/source/pages/functions/io/config/enum/io.config.enum.ConfigurationProfile.rst create mode 100644 docs/source/pages/functions/io/config/index.rst create mode 100644 docs/source/pages/functions/io/index.rst diff --git a/+io/+config/+enum/ConfigurationProfile.m b/+io/+config/+enum/ConfigurationProfile.m new file mode 100644 index 000000000..1d6e6ff07 --- /dev/null +++ b/+io/+config/+enum/ConfigurationProfile.m @@ -0,0 +1,19 @@ +classdef ConfigurationProfile < handle +%CONFIGURATIONPROFILE Dataset configuration profiles recognised by MatNWB. +% +% Use these enumeration members when selecting chunking/compression presets +% via NwbFile.applyDatasetSettingsProfile, or nwbExport. Profiles map to +% JSON files in the ``configuration`` folder: +% +% * ``default`` – general-purpose balance of size and performance. +% * ``cloud`` – tuned for object storage and remote streaming access. +% * ``archive`` – favors compact, long-term storage. +% * ``none`` – opt out of applying a profile entirely. + + enumeration + none + default + cloud + archive + end +end diff --git a/+io/+config/readDatasetConfiguration.m b/+io/+config/readDatasetConfiguration.m index e7a45ab9c..c533ae47b 100644 --- a/+io/+config/readDatasetConfiguration.m +++ b/+io/+config/readDatasetConfiguration.m @@ -1,4 +1,4 @@ -function datasetConfig = readDatasetConfiguration(profile) +function datasetConfig = readDatasetConfiguration(profile, options) % READDATASETCONFIGURATION Reads the default dataset configuration from a JSON file. % % Syntax: @@ -20,22 +20,61 @@ % % % Load the default dataset configuration % datasetConfig = io.config.readDatasetConfiguration(); +% disp(datasetConfig); +% +% Example 2 - Load dataset configurations from a specific file :: +% +% datasetConfig = io.config.readDatasetConfiguration("FilePath", "configuration_file.json"); % disp(datasetConfig); arguments - profile (1,1) string {mustBeMember(profile, [ ... - "default", ... - "cloud", ... - "archive" - ])} = "default" + profile (1,1) io.config.enum.ConfigurationProfile = "default" + options.FilePath string {mustBeJsonFileOrEmpty} = string.empty end - filename = sprintf('%s_dataset_configuration.json', profile); + if profile == io.config.enum.ConfigurationProfile.none && isempty(options.FilePath) + datasetConfig = []; + return + end - configFilePath = fullfile(misc.getMatnwbDir, 'configuration', filename); + % If FilePath is specified, we use that file + if ~isempty(options.FilePath) + configFilePath = options.FilePath; + else + filename = sprintf('%s_dataset_configuration.json', profile); + configFilePath = fullfile(misc.getMatnwbDir, 'configuration', filename); + end + datasetConfig = jsondecode(fileread(configFilePath)); datasetConfig = datasetConfig.datasetSpecifications; datasetConfig = io.config.internal.applyCustomMatNWBPropertyNames(datasetConfig); datasetConfig = io.config.internal.flipChunkDimensions(datasetConfig); end + +function mustBeJsonFileOrEmpty(value) +%MUSTBEJSONFILEOREMPTY Validate that input is a JSON file path or empty +% +% mustBeJsonFileOrEmpty(VALUE) throws an error if VALUE is not empty and +% not a character vector or string scalar ending with '.json' (case-insensitive). + + arguments + value string + end + + if isempty(value) + return + end + + assert(isscalar(value), ... + "NWB:validator:mustBeJsonFileOrEmpty:InvalidInput", ... + "Value must be a string scalar, character vector, or empty."); + + assert(endsWith(value, ".json", "IgnoreCase", true), ... + "NWB:validator:mustBeJsonFileOrEmpty:InvalidFileType", ... + "Value must end with '.json'."); + + assert(exist(value, "file") == 2, ... + "NWB:validator:mustBeJsonFileOrEmpty:FileMustExist", ... + "Value must be the name of an existing json file.") +end diff --git a/+io/+config/resolveDatasetConfiguration.m b/+io/+config/resolveDatasetConfiguration.m new file mode 100644 index 000000000..cda096e99 --- /dev/null +++ b/+io/+config/resolveDatasetConfiguration.m @@ -0,0 +1,48 @@ +function datasetConfig = resolveDatasetConfiguration(input) +% resolveDatasetConfiguration - Resolves the dataset configuration based on the input. +% +% Syntax: +% datasetConfig = io.config.resolveDatasetConfiguration(input) +% This function resolves NWB dataset configurations from the specified input, +% which can be a file path or a structure. If no input is provided, it +% uses the default NWB dataset configuration profile. +% +% Input Arguments: +% input {mustBeStringOrStruct} - A value to resolve configurations for, +% which can either be a string representing the file path to the +% configurations or a struct containing the configurations directly. +% +% Output Arguments: +% datasetConfig - The NWB dataset configurations, returned as a structure. + + arguments + input {mustBeStringOrStruct} = struct.empty + end + + if isempty(input) + disp('No dataset settings provided, using default dataset settings profile.') + datasetConfig = io.config.readDatasetConfiguration(); + + elseif ischar(input) || (isstring(input) && isscalar(input)) + input = string(input); + if isfile(input) + datasetConfig = io.config.readDatasetConfiguration("FilePath", input); + else + datasetConfig = io.config.readDatasetConfiguration(input); + end + + elseif isstruct(input) + datasetConfig = input; + end +end + +function mustBeStringOrStruct(value) + isValid = isempty(value) || ... + ischar(value) || (isstring(value) && isscalar(value)) || ... + isstruct(value); + + assert(isValid, ... + 'NWB:ResolveDatasetSettings:InvalidInput', ... + ['Expected datasetSettings to be a string (profile name or filename) ' ... + 'or a struct (already loaded settings).']) +end diff --git a/+tests/+unit/+io/+config/ApplyDatasetConfigurationTest.m b/+tests/+unit/+io/+config/ApplyDatasetConfigurationTest.m index 049123738..d39ea1853 100644 --- a/+tests/+unit/+io/+config/ApplyDatasetConfigurationTest.m +++ b/+tests/+unit/+io/+config/ApplyDatasetConfigurationTest.m @@ -274,5 +274,24 @@ function testApplyCustomMatNWBPropertyNames(testCase) testCase.verifyTrue( isfield(updatedConfig, 'VectorData_data') ); testCase.verifyTrue( isfield(updatedConfig, 'GrayscaleImage_data') ); end + + function testNwbFileApplyDatasetSettingsProfile(testCase) + nwbFile = tests.factory.NWBFile(); + + largeSeries = types.core.TimeSeries( ... + 'data', rand(64, 100000), ... + 'data_unit', 'n/a', ... + 'timestamps', 1:100000); + + nwbFile.acquisition.set('data', largeSeries); + + datasetConfig = nwbFile.applyDatasetSettingsProfile('cloud'); + + resultPipe = nwbFile.acquisition.get('data').data; + testCase.verifyTrue(isa(resultPipe, 'types.untyped.DataPipe'), ... + 'applyDatasetSettings should configure datasets using named profile'); + testCase.verifyTrue(isstruct(datasetConfig), ... + 'applyDatasetSettings should return the dataset configuration that was applied'); + end end end diff --git a/+tests/+unit/+io/+config/FunctionsTest.m b/+tests/+unit/+io/+config/FunctionsTest.m new file mode 100644 index 000000000..68f1fa4e4 --- /dev/null +++ b/+tests/+unit/+io/+config/FunctionsTest.m @@ -0,0 +1,46 @@ +classdef FunctionsTest < matlab.unittest.TestCase +% FunctionsTest - Test inputs and outputs of functions in io.config namespace + + methods (Test) + function testReadDatasetConfiguration(testCase) + % Test with no inputs: + defaultDatasetConfig = io.config.readDatasetConfiguration(); + testCase.verifyClass(defaultDatasetConfig, 'struct') + + % Test with configuration profile name + cloudDatasetConfig = io.config.readDatasetConfiguration('cloud'); + testCase.verifyClass(cloudDatasetConfig, 'struct') + testCase.verifyNotEqual(cloudDatasetConfig, defaultDatasetConfig) + + % Test with configuration profile name "none" + noConfig = io.config.readDatasetConfiguration('none'); + testCase.verifyEmpty(noConfig') + + % Test with filepath input + filename = 'default_dataset_configuration.json'; + configFilePath = fullfile(misc.getMatnwbDir, 'configuration', filename); + defaultDatasetConfigFromFile = io.config.readDatasetConfiguration('FilePath', configFilePath); + testCase.verifyEqual(defaultDatasetConfigFromFile, defaultDatasetConfig) + end + + function testResolveDatasetConfiguration(testCase) + % Test with no inputs (capture command window output): + C = evalc("defaultDatasetConfigA = io.config.resolveDatasetConfiguration()"); %#ok + testCase.verifyClass(defaultDatasetConfigA, 'struct') + + % Test with structure input, i.e already loaded configuration + defaultDatasetConfigB = io.config.resolveDatasetConfiguration(defaultDatasetConfigA); + testCase.verifyEqual(defaultDatasetConfigB, defaultDatasetConfigA) + + % Test with profile name + defaultDatasetConfigC = io.config.resolveDatasetConfiguration("default"); + testCase.verifyEqual(defaultDatasetConfigC, defaultDatasetConfigA) + + % Test with filepath input + filename = 'default_dataset_configuration.json'; + configFilePath = fullfile(misc.getMatnwbDir, 'configuration', filename); + defaultDatasetConfigD = io.config.resolveDatasetConfiguration(configFilePath); + testCase.verifyEqual(defaultDatasetConfigD, defaultDatasetConfigA) + end + end +end diff --git a/+tests/+unit/nwbExportTest.m b/+tests/+unit/nwbExportTest.m index f6178f832..216cc3bb8 100644 --- a/+tests/+unit/nwbExportTest.m +++ b/+tests/+unit/nwbExportTest.m @@ -156,6 +156,25 @@ function testExportTimeseriesWithoutStartingTimeRate(testCase) 'NWB:CustomConstraintUnfulfilled') end + function testExportAppliesDatasetSettingsOption(testCase) + nwb = tests.factory.NWBFile(); + largeSeries = types.core.TimeSeries( ... + 'data', rand(64, 100000), ... + 'data_unit', 'n/a', ... + 'timestamps', 1:100000); + + nwb.acquisition.set('export_data', largeSeries); + + nwbFilePath = testCase.getRandomFilename(); + nwbExport(nwb, nwbFilePath, 'DatasetSettingsProfile', 'cloud'); + + configuredData = nwb.acquisition.get('export_data').data; + testCase.verifyTrue(isa(configuredData, 'types.untyped.DataPipe'), ... + 'nwbExport should configure datasets when DatasetSettings option is provided'); + testCase.verifyTrue(isfile(nwbFilePath), ... + 'nwbExport should still write the requested file'); + end + function testEmbeddedSpecs(testCase) % Install extensions, one will be used, the other will not. diff --git a/NwbFile.m b/NwbFile.m index 3c4137aef..e616e1269 100644 --- a/NwbFile.m +++ b/NwbFile.m @@ -90,6 +90,97 @@ function export(obj, filename, mode) end end + function datasetConfig = applyDatasetSettingsProfile(obj, profile, options) + % APPLYDATASETSETTINGSPROFILE - Configure datasets using predefined settings profile + % + % Syntax: + % nwb.applyDatasetSettingsProfile(profile) applies a dataset + % configuration profile to the nwb-file ``nwb``. Available profiles: + % "default", "cloud", "archive". This will configure datasets in + % the NwbFile object for chunking and compression. + % + % Input Arguments: + % - obj (NwbFile) - An instance of the NwbFile class. + % + % - profile (ConfigurationProfile) - + % Specifies the settings profile to use. Default is "none". + % + % Name-Value Arguments: + % - OverrideExisting (logical) - + % This boolean determines if existing DataPipe objects in the + % file will be reconfigured with the provided options. Default is + % false. **Important**: This does not work for DataPipes that has + % previously been exported to file. + % + % Output Arguments: + % - datasetConfig - + % (Optional) The configuration settings applied to the dataset. + % + % See also: + % io.config.enum.ConfigurationProfile + % NwbFile.applyDatasetSettings + + arguments + obj (1,1) NwbFile + profile (1,1) io.config.enum.ConfigurationProfile + options.OverrideExisting (1,1) logical = false + end + + datasetConfig = io.config.readDatasetConfiguration(profile); + nvPairs = namedargs2cell(options); + obj.applyDatasetSettings(datasetConfig, nvPairs{:}); + if ~nargout + clear datasetConfig + end + end + + + function datasetConfig = applyDatasetSettings(obj, settingsReference, options) + % APPLYDATASETSETTINGS - Configure datasets using NWB dataset settings + % + % Syntax: + % nwb.applyDatasetSettings(settingsReference) applies a dataset + % configuration profile to the nwb-file ``nwb``. This method + % accepts the filename of a custom configuration profile or a + % structure representing a configuration profile. + % + % Input Arguments: + % - obj (NwbFile) - An instance of the NwbFile class. + % + % - settingsReference (string | struct) - + % The filename of a custom configuration profile or an in-memory + % structure representing a configuration profile. + % + % Name-Value Arguments: + % - OverrideExisting (logical) - + % This boolean determines if existing DataPipe objects in the + % file will be reconfigured with the provided options. Default is + % false. **Important**: This does not work for DataPipes that has + % previously been exported to file. + % + % Output Arguments: + % - datasetConfig - + % (Optional) The configuration settings applied to the dataset. + % + % See also: + % io.config.enum.ConfigurationProfile + % NwbFile.applyDatasetSettingsProfile + + arguments + obj (1,1) NwbFile + settingsReference + options.OverrideExisting (1,1) logical = false + end + + datasetConfig = io.config.resolveDatasetConfiguration(settingsReference); + + nvPairs = namedargs2cell(options); + io.config.applyDatasetConfiguration(obj, datasetConfig, nvPairs{:}); + if ~nargout + clear datasetConfig + end + end + function o = resolve(obj, path) if ischar(path) path = {path}; diff --git a/docs/source/pages/functions/index.rst b/docs/source/pages/functions/index.rst index 16aab9d5e..2b291e983 100644 --- a/docs/source/pages/functions/index.rst +++ b/docs/source/pages/functions/index.rst @@ -15,4 +15,5 @@ These are the main functions of the MatNWB API generateExtension nwbClearGenerated nwbInstallExtension + io/index matnwb/index diff --git a/docs/source/pages/functions/io/config/enum/index.rst b/docs/source/pages/functions/io/config/enum/index.rst new file mode 100644 index 000000000..d54eec551 --- /dev/null +++ b/docs/source/pages/functions/io/config/enum/index.rst @@ -0,0 +1,11 @@ ++io.config.enum +=============== + + + +.. toctree:: + :maxdepth: 2 + :caption: Functions + + io.config.enum.ConfigurationProfile + diff --git a/docs/source/pages/functions/io/config/enum/io.config.enum.ConfigurationProfile.rst b/docs/source/pages/functions/io/config/enum/io.config.enum.ConfigurationProfile.rst new file mode 100644 index 000000000..39822e1ab --- /dev/null +++ b/docs/source/pages/functions/io/config/enum/io.config.enum.ConfigurationProfile.rst @@ -0,0 +1,7 @@ +ConfigurationProfile +==================== + +.. mat:module:: io.config.enum +.. autoclass:: io.config.enum.ConfigurationProfile + :members: + :show-inheritance: diff --git a/docs/source/pages/functions/io/config/index.rst b/docs/source/pages/functions/io/config/index.rst new file mode 100644 index 000000000..ddf1d25a8 --- /dev/null +++ b/docs/source/pages/functions/io/config/index.rst @@ -0,0 +1,11 @@ ++io.config +========== + + + +.. toctree:: + :maxdepth: 2 + :caption: Functions + + + enum/index diff --git a/docs/source/pages/functions/io/index.rst b/docs/source/pages/functions/io/index.rst new file mode 100644 index 000000000..201624782 --- /dev/null +++ b/docs/source/pages/functions/io/index.rst @@ -0,0 +1,11 @@ ++io +=== + + + +.. toctree:: + :maxdepth: 2 + :caption: Functions + + + config/index diff --git a/docs/source/pages/how_to/compression/compression_profiles.rst b/docs/source/pages/how_to/compression/compression_profiles.rst index 70d7953b9..2e2365ab9 100644 --- a/docs/source/pages/how_to/compression/compression_profiles.rst +++ b/docs/source/pages/how_to/compression/compression_profiles.rst @@ -11,42 +11,90 @@ Prerequisites * Basic familiarity with creating NWB objects (see the MatNWB tutorials if needed). .. contents:: On this page - :local: - :depth: 2 + :local: + :depth: 2 At a glance ----------- 1. Create or load your ``NwbFile`` and populate data. -2. Read a dataset configuration profile (``default``, ``cloud``, or ``archive`` – or your own). -3. Apply it with :func:`io.config.applyDatasetConfiguration`. -4. Export. +2. Choose dataset settings: a built-in profile, a custom JSON file, or a struct already in memory. +3. Apply them directly on export with :func:`nwbExport` or before export with :meth:`NwbFile.applyDatasetSettingsProfile` / :meth:`NwbFile.applyDatasetSettings`. -Creating and exporting an NWB file with a profile -------------------------------------------------- +Built-in profiles (quick reference) +----------------------------------- +- ``default`` — general-purpose settings. +- ``cloud`` — chunking tuned for remote/cloud reads; moderate compression. +- ``archive`` — stronger compression for long-term storage. + +Use with either ``nwbExport(..., 'DatasetSettingsProfile', '')`` or ``NwbFile.applyDatasetSettingsProfile('')``. + + +Creating and exporting an NWB file with a dataset configuration profile +----------------------------------------------------------------------- +.. code-block:: matlab + + % 1. Create an NWB file + nwb = NwbFile( ... + 'identifier', 'compression-howto-20250411T153000Z', ... + 'session_description', 'Compression profile how-to guide', ... + 'session_start_time', datetime(2025,4,11,15,30,0,'TimeZone','UTC')); + + % 2. Add a large TimeSeries + data = rand(32, 1e6, 'single'); % Example large matrix + es = types.core.TimeSeries(... + 'data', data, ... + 'data_unit', 'volts', ... + 'starting_time', 0, ... + 'starting_time_rate', 30000); + nwb.acquisition.set('ExampleSeries', es); + + % 3. Use a built-in profile on export + nwbExport(nwb, 'example_cloud_profile.nwb', ... + 'DatasetSettingsProfile', 'cloud'); + +The file will be created with chunking and compression settings optimized for cloud access patterns and storage. + + +Verifying the applied configuration +---------------------------------- +After export, you can inspect chunking and compression with ``h5info``: + .. code-block:: matlab - % 1. Create and populate an NWB file - nwb = NwbFile( ... + info = h5info('example_cloud_profile.nwb', '/acquisition/ExampleSeries/data'); + info.ChunkSize % should reflect computed chunkSize + info.Filters % lists compression + shuffle if present + + +Inspecting the applied configuration before export +-------------------------------------------------- +You can inspect the applied configuration before export: + +.. code-block:: matlab + + % 1. Create an NWB file + nwb = NwbFile( ... 'identifier', 'compression-howto-20250411T153000Z', ... 'session_description', 'Compression profile how-to guide', ... 'session_start_time', datetime(2025,4,11,15,30,0,'TimeZone','UTC')); - data = rand(32, 1e6, 'single'); % Example large matrix - es = types.core.TimeSeries(... - 'data', data, ... - 'data_unit', 'volts', ... - 'starting_time', 0, ... - 'starting_time_rate', 30000); - nwb.acquisition.set('ExampleSeries', es); + + % 2. Add a large TimeSeries + data = rand(32, 1e6, 'single'); % Example large matrix + es = types.core.TimeSeries(... + 'data', data, ... + 'data_unit', 'volts', ... + 'starting_time', 0, ... + 'starting_time_rate', 30000); + nwb.acquisition.set('ExampleSeries', es); - % 2. Load a profile (choose "default", "cloud", or "archive") - cfg = io.config.readDatasetConfiguration("cloud"); + % 3. Apply the cloud profile (convenience method accepts profile name) + nwb.applyDatasetSettingsProfile('cloud'); - % 3. Apply it (wraps large numeric datasets in DataPipe objects) - io.config.applyDatasetConfiguration(nwb, cfg); + % 4. Inspect resulting DataPipe + dataPipe = nwb.acquisition.get('ExampleSeries').data - % 4. Export - nwbExport(nwb, 'example_cloud_profile.nwb'); +You can now inspect ``dataPipe`` properties like ``chunkSize``, ``compressionLevel`` or ``filters`` before export, and modify them if needed. Overriding an existing DataPipe @@ -55,13 +103,20 @@ If you already created a ``DataPipe`` manually (or ran a profile once) and want .. code-block:: matlab - newCfg = io.config.readDatasetConfiguration("archive"); - io.config.applyDatasetConfiguration(nwb, newCfg, "OverrideExisting", true); + nwb.applyDatasetSettingsProfile('archive', 'OverrideExisting', true); + Customizing a profile --------------------- -1. Copy one of the shipped JSON files (e.g. ``default_dataset_configuration.json``) to a new file in ``configuration/`` (e.g. ``myprofile_dataset_configuration.json``). +1. Copy one of the shipped JSON files (e.g. ``default_dataset_configuration.json``) to a new file (e.g. ``configuration/myprofile_dataset_configuration.json``). + +.. code-block:: matlab + + sourceFile = fullfile(misc.getMatnwbDir, 'configuration', 'default_dataset_configuration.json'); + targetFile = fullfile(misc.getMatnwbDir, 'configuration', 'myprofile_dataset_configuration.json'); + copyfile(sourceFile, targetFile) + edit(targetFile) 2. Adjust fields: @@ -71,11 +126,12 @@ Customizing a profile ``chunking.strategy_by_rank`` Strategy per dataset rank (key = number of dimensions). Each list element corresponds to a dimension axis. + The list length must equal the dataset rank; order matches dataset dimensions. Possible values: - - ``"flex"`` - - ``"max"`` - - *integer* (upper bound) + - ``"flex"`` - The size of the chunk in this dimension is adjusted to comply with the target_chunk_size + - ``"max"`` - The size of the chunk in this dimension will be the actual size of that dimension + - *integer* (upper bound) - The size of the chunk in this dimension will be fixed ``compression.method`` Compression algorithm: ``deflate`` (gzip), ``ZStandard`` (if available), or a custom filter ID. @@ -95,34 +151,20 @@ Customizing a profile Targets the ``data`` dataset inside any ``TwoPhotonSeries``. -4. Load it: +4. Apply it (passing the file path directly to :meth:`NwbFile.applyDatasetSettings`): .. code-block:: matlab - cfg = io.config.readDatasetConfiguration("myprofile"); - io.config.applyDatasetConfiguration(nwb, cfg); - - -Verifying the applied configuration ----------------------------------- -After export, you can inspect chunking and compression with ``h5info``: - -.. code-block:: matlab + % Apply configuration from file to the NwbFile object + nwb.applyDatasetSettings('configuration/myprofile_dataset_configuration.json'); - info = h5info('example_cloud_profile.nwb', '/acquisition/ExampleSeries/data'); - info.ChunkSize % should reflect computed chunkSize - info.Filters % lists compression + shuffle if present Troubleshooting --------------- * ``No matching rank strategy`` error: Add a list for that rank (e.g. key ``"5"``) in ``strategy_by_rank``. -* ``TargetSizeExceeded`` warning: Reduce dimensions marked ``max`` or lower numeric bounds; lower ``target_chunk_size``. +* ``TargetSizeExceeded`` warning: Reduce dimensions marked ``max`` or lower numeric bounds; increase ``target_chunk_size``. * ``Unsupported target_chunk_size_unit``: Ensure unit is one of ``bytes``, ``kiB``, ``MiB``, ``GiB``. -Next steps ----------- -* Combine with streaming writes using ``DataPipe.append`` for very large, incremental acquisitions. -* Profile read performance with different chunk strategies to tune domain‑specific workloads. Summary ------- diff --git a/nwbExport.m b/nwbExport.m index 707113637..a1f5538df 100644 --- a/nwbExport.m +++ b/nwbExport.m @@ -1,13 +1,35 @@ -function nwbExport(nwbFileObjects, filePaths, mode) +function nwbExport(nwbFileObjects, filePaths, mode, options) %NWBEXPORT - Writes an NWB file. % % Syntax: % NWBEXPORT(nwb, filename) Writes the nwb object to a file at filename. % +% NWBEXPORT(nwb, filename, Name, Value) Writes the nwb object using additional +% options provided as name-value pairs. +% % Input Arguments: % - nwb (NwbFile) - Nwb file object % - filename (string) - Filepath pointing to an NWB file. % +% Name-Value Arguments (options): +% Specify options using name-value arguments as Name1=Value1,...,NameN=ValueN, +% where Name is the argument name and Value is the corresponding value. +% +% - DatasetSettingsProfile (string) - +% Default: "none". Name of a predefined configuration profile for dataset +% chunking and compression. Available options: "default", "cloud" or +% "archive". If this argument is specified, all datasets in the file larger +% than a threshold specified in the profile will be configured for chunking +% and compression. +% +% - DatasetSettings (string | struct) - +% Default: empty struct. Provide the filename of a custom configuration +% profile or an in-memory structure representing a configuration profile. +% +% - OverrideDatasetSettings (logical) - +% Default: false. When true, existing DataPipe objects found in the file are reconfigured +% using the provided dataset settings. +% % Usage: % Example 1 - Export an NWB file:: % @@ -34,6 +56,10 @@ function nwbExport(nwbFileObjects, filePaths, mode) % % Write the nwb object to a file: % nwbExport(nwb, 'empty.nwb'); % +% Example 3 - Export an NWB file using dataset settings tuned for cloud storage:: +% +% nwbExport(nwb, 'empty.nwb', 'DatasetSettingsProfile', 'cloud'); +% % See also: % generateCore, generateExtension, NwbFile, nwbRead @@ -41,12 +67,31 @@ function nwbExport(nwbFileObjects, filePaths, mode) nwbFileObjects (1,:) NwbFile {mustBeNonempty} filePaths (1,:) string {matnwb.common.compatibility.mustBeNonzeroLengthText} mode (1,1) string {mustBeMember(mode, ["edit", "overwrite"])} = "edit" + options.DatasetSettingsProfile (1,1) io.config.enum.ConfigurationProfile = "none" + options.DatasetSettings = [] + options.OverrideDatasetSettings (1,1) logical = false end assert(length(nwbFileObjects) == length(filePaths), ... 'NWB:Export:FilepathLengthMismatch', ... 'Lists of NWB objects to export and list of file paths must be the same length.') + shouldApplyDatasetSettings = ~isempty(options.DatasetSettings) || ... + ~strcmp(string(options.DatasetSettingsProfile), "none"); + + if shouldApplyDatasetSettings + % Prepare dataset settings once and reuse across files. + if ~isempty(options.DatasetSettings) + datasetConfig = io.config.resolveDatasetConfiguration(options.DatasetSettings); + else + datasetConfig = io.config.readDatasetConfiguration(options.DatasetSettingsProfile); + end + for iFiles = 1:length(nwbFileObjects) + nwbFileObjects(iFiles).applyDatasetSettings(... + datasetConfig, 'OverrideExisting', options.OverrideDatasetSettings); + end + end + for iFiles = 1:length(nwbFileObjects) filePath = char(filePaths(iFiles)); nwbFileObjects(iFiles).export(filePath, mode); diff --git a/tools/documentation/matnwb_generateRstFilesFromCode.m b/tools/documentation/matnwb_generateRstFilesFromCode.m index 055e9f69e..dbc4db7b7 100644 --- a/tools/documentation/matnwb_generateRstFilesFromCode.m +++ b/tools/documentation/matnwb_generateRstFilesFromCode.m @@ -9,6 +9,7 @@ function matnwb_generateRstFilesFromCode() "generateExtension", ... "nwbClearGenerated", ... "nwbInstallExtension", ... + "io.config.enum.ConfigurationProfile", ... "matnwb.extension.listExtensions", ... "matnwb.extension.getExtensionInfo" ... ]; @@ -21,4 +22,4 @@ function matnwb_generateRstFilesFromCode() generateRstForNeurodataTypeClasses('core') generateRstForNeurodataTypeClasses('hdmf_common') generateRstForNeurodataTypeClasses('hdmf_experimental') -end \ No newline at end of file +end From 25143caf29e5ec22a2004d9e75a428a24a41b222 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 23 Oct 2025 12:15:27 +0200 Subject: [PATCH 53/67] Document limitations on editing NWB datasets in MatNWB Added clarification that in-place editing of dataset data is not supported in MatNWB and referenced the relevant GitHub issue for users seeking this functionality. --- .../concepts/file_create/editing_nwb_files.rst | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/docs/source/pages/concepts/file_create/editing_nwb_files.rst b/docs/source/pages/concepts/file_create/editing_nwb_files.rst index 433df28d5..99ae7f11c 100644 --- a/docs/source/pages/concepts/file_create/editing_nwb_files.rst +++ b/docs/source/pages/concepts/file_create/editing_nwb_files.rst @@ -5,9 +5,16 @@ Editing NWB files After an NWB-file has been exported to disk, it can be re-imported and edited. Generally, adding new data and metadata is straightforward. However, due to the way MatNWB and HDF5 work, there are some limitations when modifying or removing datasets from an existing NWB file. This section outlines these limitations and provides guidance on how to work with existing NWB files in MatNWB. -1. **Appending** data to a dataset requires the dataset to have been created as extendable. This is typically done when initially creating a dataset, using the :class:`~types.untyped.DataPipe` class. If the dataset was not created as extendable, it cannot be resized or appended to. +1. **Editing** data of datasets in place is currently not supported -2. **Removing** property values or neurodata objects from the file object does not free up space in the file itself. If you need to significantly restructure a file, the standard approach is to create a new NWB file and copy the desired data into it. +2. **Appending** data to a dataset requires the dataset to have been created as extendable. This is typically done when initially creating a dataset, using the :class:`~types.untyped.DataPipe` class. If the dataset was not created as extendable, it cannot be resized or appended to. + +3. **Removing** property values or neurodata objects from the file object does not free up space in the file itself. If you need to significantly restructure a file, the standard approach is to create a new NWB file and copy the desired data into it. + + +Editing/modifying data of existing datasets is not supported +------------------------------------------------------------ +This is a current limitation in MatNWB. If this is something you have a need for, please check out `this MatNWB issue `_. Appending data to existing datasets ----------------------------------- @@ -29,5 +36,6 @@ Removing data from existing files At the moment, MatNWB does not provide built-in functionality to copy data from one NWB file to another. However, you can achieve this by manually reading the desired data from the existing file and writing it to a new file using the appropriate MatNWB classes and methods. -The following issue on GitHub tracks some of the limitations and potential improvements related to editing NWB files in MatNWB: -`MatNWB - Issue 751 `_ +The following issues on GitHub track some of the limitations and potential improvements related to editing NWB files in MatNWB: +`MatNWB - Issue 751 `_ - Reexport datasets to another file +`MatNWB - Issue 760 `_ - Edit data in place From 516b57ece9fd7d74e8e734018a7f61a3511638d4 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 23 Oct 2025 12:20:28 +0200 Subject: [PATCH 54/67] Update editing_nwb_files.rst --- .../pages/concepts/file_create/editing_nwb_files.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/source/pages/concepts/file_create/editing_nwb_files.rst b/docs/source/pages/concepts/file_create/editing_nwb_files.rst index 99ae7f11c..650f3835c 100644 --- a/docs/source/pages/concepts/file_create/editing_nwb_files.rst +++ b/docs/source/pages/concepts/file_create/editing_nwb_files.rst @@ -5,7 +5,7 @@ Editing NWB files After an NWB-file has been exported to disk, it can be re-imported and edited. Generally, adding new data and metadata is straightforward. However, due to the way MatNWB and HDF5 work, there are some limitations when modifying or removing datasets from an existing NWB file. This section outlines these limitations and provides guidance on how to work with existing NWB files in MatNWB. -1. **Editing** data of datasets in place is currently not supported +1. **Editing** data of datasets in place is currently not supported. 2. **Appending** data to a dataset requires the dataset to have been created as extendable. This is typically done when initially creating a dataset, using the :class:`~types.untyped.DataPipe` class. If the dataset was not created as extendable, it cannot be resized or appended to. @@ -37,5 +37,6 @@ Removing data from existing files At the moment, MatNWB does not provide built-in functionality to copy data from one NWB file to another. However, you can achieve this by manually reading the desired data from the existing file and writing it to a new file using the appropriate MatNWB classes and methods. The following issues on GitHub track some of the limitations and potential improvements related to editing NWB files in MatNWB: -`MatNWB - Issue 751 `_ - Reexport datasets to another file -`MatNWB - Issue 760 `_ - Edit data in place + +- `MatNWB - Issue 751 `_ - Reexport datasets to another file +- `MatNWB - Issue 760 `_ - Edit data in place From df6488b76ea6c049d1fd1850867b77307fb8e00c Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 23 Oct 2025 16:45:43 +0200 Subject: [PATCH 55/67] Fix links and formatting issues --- NwbFile.m | 32 ++++++++++--------- .../pages/concepts/dimension_ordering.rst | 5 +-- .../file_create/editing_nwb_files.rst | 2 +- .../pages/concepts/file_read/dynamictable.rst | 2 +- .../pages/concepts/file_read/untyped.rst | 3 +- .../pages/getting_started/installation.rst | 2 +- .../source/pages/getting_started/overview.rst | 3 +- .../compression/compression_profiles.rst | 3 +- 8 files changed, 29 insertions(+), 23 deletions(-) diff --git a/NwbFile.m b/NwbFile.m index e616e1269..aff334733 100644 --- a/NwbFile.m +++ b/NwbFile.m @@ -100,21 +100,22 @@ function export(obj, filename, mode) % the NwbFile object for chunking and compression. % % Input Arguments: - % - obj (NwbFile) - An instance of the NwbFile class. + % - obj (NwbFile) - + % An instance of the NwbFile class. % % - profile (ConfigurationProfile) - - % Specifies the settings profile to use. Default is "none". + % Specifies the settings profile to use. Default is "none". % % Name-Value Arguments: % - OverrideExisting (logical) - - % This boolean determines if existing DataPipe objects in the - % file will be reconfigured with the provided options. Default is - % false. **Important**: This does not work for DataPipes that has - % previously been exported to file. + % This boolean determines if existing DataPipe objects in the + % file will be reconfigured with the provided options. Default is + % false. **Important**: This does not work for DataPipes that has + % previously been exported to file. % % Output Arguments: % - datasetConfig - - % (Optional) The configuration settings applied to the dataset. + % (Optional) The configuration settings applied to the dataset. % % See also: % io.config.enum.ConfigurationProfile @@ -145,22 +146,23 @@ function export(obj, filename, mode) % structure representing a configuration profile. % % Input Arguments: - % - obj (NwbFile) - An instance of the NwbFile class. + % - obj (NwbFile) - + % An instance of the NwbFile class. % % - settingsReference (string | struct) - - % The filename of a custom configuration profile or an in-memory - % structure representing a configuration profile. + % The filename of a custom configuration profile or an in-memory + % structure representing a configuration profile. % % Name-Value Arguments: % - OverrideExisting (logical) - - % This boolean determines if existing DataPipe objects in the - % file will be reconfigured with the provided options. Default is - % false. **Important**: This does not work for DataPipes that has - % previously been exported to file. + % This boolean determines if existing DataPipe objects in the + % file will be reconfigured with the provided options. Default is + % false. **Important**: This does not work for DataPipes that has + % previously been exported to file. % % Output Arguments: % - datasetConfig - - % (Optional) The configuration settings applied to the dataset. + % (Optional) The configuration settings applied to the dataset. % % See also: % io.config.enum.ConfigurationProfile diff --git a/docs/source/pages/concepts/dimension_ordering.rst b/docs/source/pages/concepts/dimension_ordering.rst index 7493c8755..c775d515e 100644 --- a/docs/source/pages/concepts/dimension_ordering.rst +++ b/docs/source/pages/concepts/dimension_ordering.rst @@ -33,7 +33,8 @@ Without DataPipes ^^^^^^^^^^^^^^^^^ See the documentation at the following link: -`without DataPipes <../tutorials/dimensionMapNoDataPipes.html>`_ +:doc:`without DataPipes ` + **Writing to File** @@ -69,7 +70,7 @@ With DataPipes ^^^^^^^^^^^^^^ See the documentation at the following link: -`with DataPipes <../tutorials/dimensionMapWithDataPipes.html>`_ +:doc:`with DataPipes ` **Writing to File** diff --git a/docs/source/pages/concepts/file_create/editing_nwb_files.rst b/docs/source/pages/concepts/file_create/editing_nwb_files.rst index 650f3835c..b59f06fca 100644 --- a/docs/source/pages/concepts/file_create/editing_nwb_files.rst +++ b/docs/source/pages/concepts/file_create/editing_nwb_files.rst @@ -20,7 +20,7 @@ Appending data to existing datasets ----------------------------------- :ref:`HDF5 ` datasets can be created with fixed dimensions or as extendable datasets. By default, MatNWB creates datasets with fixed dimensions. Datasets that were created with fixed dimensions cannot be resized or appended to after they have been written to disk. This means that if you want to append data to a dataset in an existing NWB file, the dataset must have been created as extendable from the start. This is done using the :class:`~types.untyped.DataPipe` class when initially creating the dataset. -The :class:`~types.untyped.DataPipe` class provides a way to create extendable datasets by specifying the ``chunkSize`` and ``maxSize`` properties. The ``chunkSize`` property determines the size of the chunks that will be written to the dataset, while the ``maxSize`` property determines the maximum size of the dataset. By setting these properties appropriately, you can create a dataset that can be resized and appended to as needed. +The :class:`~types.untyped.DataPipe` class provides a way to create extendable datasets by specifying the ``chunkSize`` and ``maxSize`` properties. The ``chunkSize`` property determines the size of the chunks that will be written to the dataset, while the ``maxSize`` property determines the maximum size of the dataset. By adjusting these properties, you can create a dataset that can be resized and appended to as needed. If you know the final size of a dataset, ``maxSize`` can be set to this value to optimize storage allocation. If the final size is unknown, the ``maxSize`` can be set to ``Inf`` along one or more dimensions to allow unlimited growth. diff --git a/docs/source/pages/concepts/file_read/dynamictable.rst b/docs/source/pages/concepts/file_read/dynamictable.rst index 2242468ef..fa81eccdf 100644 --- a/docs/source/pages/concepts/file_read/dynamictable.rst +++ b/docs/source/pages/concepts/file_read/dynamictable.rst @@ -34,4 +34,4 @@ Finally, if you prefer to select using your custom ``id`` column, you can specif tableData = dynamicTable.getRow(, 'useId', true); -For more information regarding Dynamic Tables in MatNWB as well as information regarding writing data to them, please see the `MatNWB DynamicTables Tutorial <../../tutorials/dynamic_tables.html>`_. +For more information regarding Dynamic Tables in MatNWB as well as information regarding writing data to them, please see the :doc:`DynamicTables Tutorial `. diff --git a/docs/source/pages/concepts/file_read/untyped.rst b/docs/source/pages/concepts/file_read/untyped.rst index 9213a62d1..9b7ef164c 100644 --- a/docs/source/pages/concepts/file_read/untyped.rst +++ b/docs/source/pages/concepts/file_read/untyped.rst @@ -41,7 +41,8 @@ DataStubs and DataPipes .. image:: https://github.com/NeurodataWithoutBorders/nwb-overview/blob/main/docs/source/img/matnwb_datastub.png?raw=true -**DataPipes** are similar to DataStubs in that they allow you to load data from disk; however, they also provide a wide array of features that allow the user to write data to disk, either by streaming parts of data in at a time or by compressing the data before writing. The DataPipe is an advanced type and users looking to leverage DataPipe's capabilities to stream/iteratively write or compress data should read the `Advanced Data Write Tutorial <../../tutorials/dataPipe.html>`_. +**DataPipes** are similar to DataStubs in that they allow you to load data from disk; however, they also provide a wide array of features that allow the user to write data to disk, either by streaming parts of data in at a time or by compressing the data before writing. The DataPipe is an advanced type and users looking to leverage DataPipe's capabilities to stream/iteratively write or compress data should read the :doc:`Advanced Data Write Tutorial ` + .. _matnwb-read-untyped-links-views: diff --git a/docs/source/pages/getting_started/installation.rst b/docs/source/pages/getting_started/installation.rst index 1316ec9f4..e69ec2860 100644 --- a/docs/source/pages/getting_started/installation.rst +++ b/docs/source/pages/getting_started/installation.rst @@ -138,7 +138,7 @@ Troubleshooting MATLAB cannot find MatNWB functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- Ensure the MatNWB folder is on the path (see “Verify your installation”). +- Ensure the MatNWB folder is on the path (see `Verify your installation`_). - If needed, restart MATLAB after calling ``savepath()``. - Use ``which nwbRead -all`` to diagnose duplicate or shadowed installs. diff --git a/docs/source/pages/getting_started/overview.rst b/docs/source/pages/getting_started/overview.rst index 367028c7a..cd955b481 100644 --- a/docs/source/pages/getting_started/overview.rst +++ b/docs/source/pages/getting_started/overview.rst @@ -58,7 +58,8 @@ The main categories of types you will work with - Containers/wrappers: organize related data (e.g., :doc:`ProcessingModule `). - Time series: sampled data over time (e.g., :doc:`TimeSeries `, :doc:`ElectricalSeries `). - Tables: columnar metadata or data (e.g., :doc:`DynamicTable `). -- Helpers: :ref:`Helper types ` for common patterns like references, links, and data I/O. +- Helpers: :doc:`Helper types ` for common patterns like references, links, and data I/O. + .. [Todo: expand, and link to helper types reference and concept pages when these are added]. .. [Todo: For tables: TimeIntervals, Units, ElectrodesTable] diff --git a/docs/source/pages/how_to/compression/compression_profiles.rst b/docs/source/pages/how_to/compression/compression_profiles.rst index 2e2365ab9..cff8c2216 100644 --- a/docs/source/pages/how_to/compression/compression_profiles.rst +++ b/docs/source/pages/how_to/compression/compression_profiles.rst @@ -57,7 +57,7 @@ The file will be created with chunking and compression settings optimized for cl Verifying the applied configuration ----------------------------------- +----------------------------------- After export, you can inspect chunking and compression with ``h5info``: .. code-block:: matlab @@ -126,6 +126,7 @@ Customizing a profile ``chunking.strategy_by_rank`` Strategy per dataset rank (key = number of dimensions). Each list element corresponds to a dimension axis. + The list length must equal the dataset rank; order matches dataset dimensions. Possible values: From 6cd3415bf4ab3c8c4a457e1c2636fcd420d3e69e Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 23 Oct 2025 16:54:51 +0200 Subject: [PATCH 56/67] Update documentation with citation info and improved links Added a 'Cite MatNWB' section to the index with a link to citation instructions. Improved references in the getting started overview by formatting class names as code and linking to the configuration profiles guide. --- docs/source/index.rst | 5 +++++ docs/source/pages/getting_started/overview.rst | 6 +++--- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index 7eddbfafd..1e72141d9 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -88,3 +88,8 @@ Looking for a specific topic which has not been mentioned? Check out the full ta pages/developer/documentation pages/developer/releases + +Cite MatNWB +=========== + +If MatNWB contributes to your work, please see :doc:`Citing MatNWB `. \ No newline at end of file diff --git a/docs/source/pages/getting_started/overview.rst b/docs/source/pages/getting_started/overview.rst index cd955b481..79b354864 100644 --- a/docs/source/pages/getting_started/overview.rst +++ b/docs/source/pages/getting_started/overview.rst @@ -25,7 +25,7 @@ What you can do with MatNWB - Read NWB files - One call to :doc:`nwbRead ` opens a file and presents a hierarchical representation of the complete file and its contents. - - Lazy I/O via DataStub lets you slice large datasets without loading them into RAM. + - Lazy I/O via ``DataStub`` lets you slice large datasets without loading them into RAM. - Write NWB files @@ -34,8 +34,8 @@ What you can do with MatNWB - Scale to large data - - Stream/append and compress data with the DataPipe interface. - - Use predefined or custom configuration profiles to optimize files for local storage, cloud storage or archiving. + - Stream/append and compress data with the ``DataPipe`` interface. + - Use predefined or custom :doc:`configuration profiles ` to optimize files for local storage, cloud storage or archiving. .. Todo: Add links to DataPipe reference and configuration profiles guide when these are added. From 44342b8f4336aa511354bdbc1797274de517e68c Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 23 Oct 2025 20:17:52 +0200 Subject: [PATCH 57/67] Remove concepts page on neurodata types --- docs/source/pages/concepts/file_create.rst | 1 - .../concepts/file_create/neurodata_types.rst | 98 ------------------- 2 files changed, 99 deletions(-) delete mode 100644 docs/source/pages/concepts/file_create/neurodata_types.rst diff --git a/docs/source/pages/concepts/file_create.rst b/docs/source/pages/concepts/file_create.rst index 31f896da9..b1f930c0e 100644 --- a/docs/source/pages/concepts/file_create.rst +++ b/docs/source/pages/concepts/file_create.rst @@ -39,7 +39,6 @@ If anything is missing or incorrect, you'll get an error message explaining what :titlesonly: Understanding the NwbFile Object - Understanding Neurodata Types Storage Backends Editing NWB Files Performance Optimization diff --git a/docs/source/pages/concepts/file_create/neurodata_types.rst b/docs/source/pages/concepts/file_create/neurodata_types.rst deleted file mode 100644 index 4465f96af..000000000 --- a/docs/source/pages/concepts/file_create/neurodata_types.rst +++ /dev/null @@ -1,98 +0,0 @@ -Understanding MatNWB Neurodata Types -==================================== - -MatNWB neurodata types are MATLAB classes designed to represent different kinds of neuroscience data in a structured and interoperable way. They combine data, metadata, and contextual information, enabling consistent organization, interpretation, and sharing across tools and research environments. - -Why Use Specialized Neurodata Types? ------------------------------------- - -Standard MATLAB data structures like arrays or structs are flexible but lack domain-specific constraints. MatNWB types provide additional structure and semantics that are essential for reliable data handling in neuroscience: - -- **Domain-specific structure**: Each type encodes the metadata and relationships required for a particular data modality. For example, a :class:`types.core.ElectricalSeries` requires electrode metadata, sampling information, and data units. -- **Built-in validation**: Types enforce the presence of essential information, reducing the likelihood of common errors. For instance, :class:`types.core.TwoPhotonSeries` cannot be created without specifying the imaging plane. -- **Interoperability**: Data stored using these types is compatible with NWB-compliant tools and workflows, making it easier to share and reuse across different software ecosystems. - -The Central Concept: TimeSeries -------------------------------- - -Many experimental signals in neuroscience change over time. The :class:`types.core.TimeSeries` type provides a standardized structure for representing these signals together with their temporal context. - -A TimeSeries object combines: - -- **Data and meaning**: The recorded measurements alongside descriptions of what they represent. -- **Temporal information**: Flexible handling of regular or irregular sampling, timestamps, and time references. -- **Metadata**: Units, descriptions, and experiment-specific context stored together with the data. -- **Relationships**: References to other objects, such as stimulus definitions or behavioral events. - -Use a basic TimeSeries when the data varies over time but does not require the additional structure of a specialized type. Examples include custom behavioral metrics, environmental sensor data, or novel measurement modalities. - -Specialized TimeSeries Variants -------------------------------- - -MatNWB builds on the TimeSeries concept with specialized types tailored to common experimental data. These types capture modality-specific metadata, constraints, and relationships. - -**ElectricalSeries: Electrical Recordings** - -Electrophysiological recordings require metadata about the electrodes, their positions, and acquisition parameters. :class:`types.core.ElectricalSeries` links time-varying voltage data with this contextual information, allowing downstream tools to interpret the signals accurately. - -**TwoPhotonSeries and OnePhotonSeries: Optical Recordings** - -Optical recordings, such as calcium imaging, differ fundamentally from electrical recordings. These types include metadata about imaging planes, indicators, and acquisition parameters (e.g., excitation wavelength), reflecting the experimental conditions required to interpret fluorescence-based signals. - -**SpatialSeries: Positional and Movement Data** - -Behavioral tracking data records spatial coordinates over time. :class:`types.core.SpatialSeries` includes information about reference frames, coordinate systems, and spatial dimensions, which are necessary for interpreting positional measurements correctly. - -Container Types: Organizing Related Data ----------------------------------------- - -Some MatNWB types act as containers for other data objects, structuring them into logical groupings. - -**ProcessingModule: Analysis Grouping** - -Experiments often produce multiple derived datasets from different processing steps. :class:`types.core.ProcessingModule` groups these results, preserving their relationships to raw data and to each other within an analysis workflow. - -**Behavioral Containers: Position, CompassDirection, BehavioralEvents** - -Behavioral experiments frequently generate multiple types of measurements. Container types provide a consistent organizational structure for these related datasets, making it easier for collaborators and tools to understand their relationships. - -Table-Based Types: Structured Metadata --------------------------------------- - -Not all experimental information is time-series based. Some metadata is better represented in tabular form, particularly when it describes static properties or discrete events. - -**Units Table: Discrete Spike Data** - -Sorted spike data consists of discrete events (spikes) that occur at variable times. The :class:`types.core.Units` table organizes spike times, waveforms, and unit metadata in a structured and queryable way. - -**Electrode Tables: Recording Site Metadata** - -Metadata describing recording sites—such as electrode position, impedance, and brain region—is typically static during an experiment. Electrode tables store this information once and allow it to be referenced by multiple data types. - -**Trials Table: Time-Indexed Experimental Structure** - -Many experiments are organized into discrete trials or epochs. The NWB ``trials`` table (a :class:`types.core.TimeIntervals` object) captures these segments using required ``start_time`` and ``stop_time`` columns and any number of user-defined per-trial metadata columns (e.g., stimulus identity, condition, response correctness). - -Working with MatNWB Types -------------------------- - -MatNWB neurodata types use object-oriented design principles to integrate structure, validation, and relationships directly into the data model: - -- **Object properties**: Each type defines a fixed set of properties, ensuring required metadata is always present and validated when objects are created. -- **Automatic linking**: References between related objects (e.g., an ElectricalSeries referencing an electrode table) are handled automatically. -- **Extensibility**: While core properties are fixed, additional metadata can be attached as needed to capture experiment-specific details. -- **Error prevention**: Structural validation reduces errors by detecting missing information, type mismatches, or inconsistent shapes early. - -Selecting the Appropriate Type ------------------------------- - -Choosing the right type depends on the nature of the data and how it fits into the broader experimental context. Consider the following questions: - -- What is being measured? (e.g., electrical activity, fluorescence, position) -- How is it related to other parts of the experiment? -- What metadata is required to interpret the measurement? -- Would another researcher understand the data structure without additional explanation? - -A practical approach is to begin with the general :class:`types.core.TimeSeries` for any time-varying data. As familiarity increases, adopt more specialized types that better capture the semantics and constraints of specific experimental modalities. - -Organize data to reflect the experimental workflow: raw measurements in acquisition, processed results in processing modules, and analysis outputs in analysis groups. This structure aligns the data model with the scientific process and supports reproducibility and interoperability. From 449e4748961dbf9fbda14692edbb52eb6187210e Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 23 Oct 2025 20:19:08 +0200 Subject: [PATCH 58/67] Clarify NWB schema usage and class regeneration docs Improved explanations and updated links regarding NWB schemas, clarified when class regeneration can be skipped, and refined language for better accuracy. Removed unrealistic use case examples for generating classes in separate directories. --- .../concepts/file_read/schemas_and_generation.rst | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/docs/source/pages/concepts/file_read/schemas_and_generation.rst b/docs/source/pages/concepts/file_read/schemas_and_generation.rst index 8ae512d40..20deae31e 100644 --- a/docs/source/pages/concepts/file_read/schemas_and_generation.rst +++ b/docs/source/pages/concepts/file_read/schemas_and_generation.rst @@ -8,7 +8,7 @@ This page covers the advanced concepts behind how MatNWB works with NWB schemas What are NWB Schemas? --------------------- -NWB schemas are formal specifications that define: +`NWB schemas `_: are formal specifications that define: - **Data types** and their properties - **Relationships** between different data types @@ -98,7 +98,7 @@ If a file uses custom extensions, use :func:`generateExtension`: Reading Files Without Regeneration ----------------------------------- -If you're reading multiple files with the same schema, you can skip class regeneration for faster loading: +If you're reading multiple files created with the same schema version, you can skip class regeneration for faster loading: .. code-block:: MATLAB @@ -108,7 +108,7 @@ If you're reading multiple files with the same schema, you can skip class regene This is useful when: - Reading many files from the same experiment -- You know the classes are already generated and current +- You know the NWB type classes are already generated and current - You want faster file loading .. warning:: @@ -157,11 +157,6 @@ When running multiple MATLAB sessions on the same machine for testing or process session2_dir = '/tmp/matlab_session_2_classes'; generateCore('savedir', session2_dir); -**Other Use Cases:** -- You don't have write permissions to the MatNWB installation directory -- You want to keep different projects' classes separate -- Working with different schema versions (though not simultaneously) - Understanding Class Files -------------------------- From ed17a14716d7241ae3836418d7724eb2e6913a84 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 23 Oct 2025 20:19:21 +0200 Subject: [PATCH 59/67] Add note on lazy loading with DataStub in MatNWB Added an 'important' section explaining MatNWB's lazy reading mechanism using DataStub objects. This clarifies how large datasets are handled efficiently and provides guidance on accessing and loading data. --- docs/source/pages/concepts/file_read.rst | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/source/pages/concepts/file_read.rst b/docs/source/pages/concepts/file_read.rst index b0d2ffbc6..9f49af2ce 100644 --- a/docs/source/pages/concepts/file_read.rst +++ b/docs/source/pages/concepts/file_read.rst @@ -17,6 +17,16 @@ This command performs several important tasks behind the scenes: The returned `NwbFile` object is the primary access point for all the data in the file. In the :ref:`next section`, we will examine the structure of this object in detail, covering how to explore it using standard MATLAB dot notation to access experimental metadata, raw recordings, processed data, and analysis results, as well as how to search for specific data types. +.. important:: + **Lazy Loading:** MatNWB uses lazy reading to efficiently work with large datasets. When you access a dataset through the `NwbFile` object, MatNWB returns a :class:`types.untyped.DataStub` object instead of loading the entire dataset into memory. This allows you to: + + - Work with files larger than available RAM + - Read only the portions of data you need + - Index into datasets using standard MATLAB array syntax + - Load the full dataset explicitly using the ``.load()`` method + + For more details, see :ref:`DataStubs and DataPipes`. + .. note:: The :func:`nwbRead` function currently does not support reading NWB files stored in Zarr format. From a6826b25012eecd242062d19e05c2f1651465851 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 23 Oct 2025 20:24:49 +0200 Subject: [PATCH 60/67] Update storage_backends.rst --- docs/source/pages/concepts/file_create/storage_backends.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/source/pages/concepts/file_create/storage_backends.rst b/docs/source/pages/concepts/file_create/storage_backends.rst index 7da57c5b0..3f0d80525 100644 --- a/docs/source/pages/concepts/file_create/storage_backends.rst +++ b/docs/source/pages/concepts/file_create/storage_backends.rst @@ -5,6 +5,9 @@ Storage Backends MatNWB currently uses the HDF5 file format for storing NWB files on disk. Please note that NWB is designed to be storage backend agnostic, and future versions of MatNWB may support additional storage backends. +.. TIP:: + For more information about NWB storage, see the `NWB Storage Documentation `_. + .. _about-hdf5: What is HDF5? From e9ec92f05c8ebc3d06bf612d20a2701188289137 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 23 Oct 2025 20:30:20 +0200 Subject: [PATCH 61/67] Apply suggestion from @bendichter Co-authored-by: Ben Dichter --- docs/source/pages/concepts/file_read/schemas_and_generation.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/source/pages/concepts/file_read/schemas_and_generation.rst b/docs/source/pages/concepts/file_read/schemas_and_generation.rst index 20deae31e..8b9fb90df 100644 --- a/docs/source/pages/concepts/file_read/schemas_and_generation.rst +++ b/docs/source/pages/concepts/file_read/schemas_and_generation.rst @@ -13,7 +13,6 @@ What are NWB Schemas? - **Data types** and their properties - **Relationships** between different data types - **Validation rules** for data integrity -- **File organization** standards Think of schemas as blueprints that ensure all NWB files follow the same organizational principles, regardless of who created them or what software was used. From 50c618ddbd1d26d3a9c8931f6b5bd6cb78c00a7f Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 23 Oct 2025 20:59:49 +0200 Subject: [PATCH 62/67] Update overview.rst --- docs/source/pages/getting_started/overview.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/pages/getting_started/overview.rst b/docs/source/pages/getting_started/overview.rst index 79b354864..0aa72ba98 100644 --- a/docs/source/pages/getting_started/overview.rst +++ b/docs/source/pages/getting_started/overview.rst @@ -79,7 +79,7 @@ Common questions you may encounter (and where to find answers) - How do I name neurodata types when adding to sets? - - Refer to the :nwbinspector:`Naming Conventions ` section of the NWB Inspector docs. + - Refer to the :nwbinspector:`Naming Conventions ` section of the NWB Inspector docs. - What properties are required and how do I set them? From e3f3977b41d58ce6769df2505f147689f15bc165 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Thu, 23 Oct 2025 23:48:48 +0200 Subject: [PATCH 63/67] Revise documentation on editing NWB files in MatNWB Reorganized and clarified the documentation for editing NWB files with MatNWB. Added sections on supported operations (adding and appending data), detailed current limitations (editing in-place, appending to non-extendable datasets, and removing data), and provided guidance on using PyNWB for advanced editing. --- .../file_create/editing_nwb_files.rst | 54 +++++++++++-------- 1 file changed, 33 insertions(+), 21 deletions(-) diff --git a/docs/source/pages/concepts/file_create/editing_nwb_files.rst b/docs/source/pages/concepts/file_create/editing_nwb_files.rst index b59f06fca..fe0b3aefc 100644 --- a/docs/source/pages/concepts/file_create/editing_nwb_files.rst +++ b/docs/source/pages/concepts/file_create/editing_nwb_files.rst @@ -3,40 +3,52 @@ Editing NWB files ================= -After an NWB-file has been exported to disk, it can be re-imported and edited. Generally, adding new data and metadata is straightforward. However, due to the way MatNWB and HDF5 work, there are some limitations when modifying or removing datasets from an existing NWB file. This section outlines these limitations and provides guidance on how to work with existing NWB files in MatNWB. +After an NWB file has been exported to disk, it can be re-imported and edited. MatNWB supports **adding new data and metadata** to existing files, as well as **appending to extendable datasets**. This section provides guidance on working with existing NWB files in MatNWB and outlines current limitations. -1. **Editing** data of datasets in place is currently not supported. +What MatNWB supports +-------------------- -2. **Appending** data to a dataset requires the dataset to have been created as extendable. This is typically done when initially creating a dataset, using the :class:`~types.untyped.DataPipe` class. If the dataset was not created as extendable, it cannot be resized or appended to. +Adding new data to existing files +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +MatNWB makes it straightforward to add new data and metadata to existing NWB files. Simply read the file, add your new content, and export it again. For example, you can add new time series, processing modules, or other neurodata objects to an existing file. -3. **Removing** property values or neurodata objects from the file object does not free up space in the file itself. If you need to significantly restructure a file, the standard approach is to create a new NWB file and copy the desired data into it. +Appending data to extendable datasets +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +You can append data to datasets that were created as extendable. :ref:`HDF5 ` datasets can be created with fixed dimensions or as extendable datasets. To make a dataset extendable, use the :class:`~types.untyped.DataPipe` class when initially creating the dataset, specifying the ``chunkSize`` and ``maxSize`` properties. +The ``maxSize`` property determines the maximum size of the dataset. If you know the final size, set ``maxSize`` to optimize storage. For unlimited growth, set ``maxSize`` to ``Inf`` along one or more dimensions. -Editing/modifying data of existing datasets is not supported ------------------------------------------------------------- -This is a current limitation in MatNWB. If this is something you have a need for, please check out `this MatNWB issue `_. +For a detailed example of creating and using extendable datasets, see the :doc:`DataPipe tutorial `. -Appending data to existing datasets ------------------------------------ -:ref:`HDF5 ` datasets can be created with fixed dimensions or as extendable datasets. By default, MatNWB creates datasets with fixed dimensions. Datasets that were created with fixed dimensions cannot be resized or appended to after they have been written to disk. This means that if you want to append data to a dataset in an existing NWB file, the dataset must have been created as extendable from the start. This is done using the :class:`~types.untyped.DataPipe` class when initially creating the dataset. -The :class:`~types.untyped.DataPipe` class provides a way to create extendable datasets by specifying the ``chunkSize`` and ``maxSize`` properties. The ``chunkSize`` property determines the size of the chunks that will be written to the dataset, while the ``maxSize`` property determines the maximum size of the dataset. By adjusting these properties, you can create a dataset that can be resized and appended to as needed. +Known limitations in MatNWB +---------------------------- -If you know the final size of a dataset, ``maxSize`` can be set to this value to optimize storage allocation. If the final size is unknown, the ``maxSize`` can be set to ``Inf`` along one or more dimensions to allow unlimited growth. +Some editing operations are not currently supported in MatNWB: -For an example of how to use the :class:`~types.untyped.DataPipe` class to create an extendable dataset, see the :doc:`DataPipe example ` tutorial. +**Editing data in-place** + Modifying existing dataset values after they have been written to disk is not currently supported. If you need to edit data values, you will need to create a new file and write corrected data. + + See `MatNWB Issue #760 `_ for discussion about this feature. -Removing data from existing files ---------------------------------- -:ref:`HDF5 ` support for removing datasets or attributes is limited. While it is possible at a low level to "unlink" objects from the file, this does not reclaim the storage space used by that object. If you need to significantly restructure a file, the standard approach is to create a new NWB file and copy the desired data into it. +**Appending to non-extendable datasets** + Datasets created without the :class:`~types.untyped.DataPipe` class have fixed dimensions and cannot be resized. Plan ahead by making datasets extendable if you anticipate needing to append data. + +**Removing data to reclaim disk space** + Due to :ref:`HDF5 ` limitations, removing datasets or attributes does not reclaim storage space. If you need to significantly restructure a file, create a new NWB file and copy the desired data into it. + + See `MatNWB Issue #751 `_ for progress on this feature. .. warning:: - The :class:`types.untyped.Set` provides a method called ``remove`` that can be used to remove objects from a set. However, this only removes the object from the in-memory representation of the file and does not remove it from the file on disk. + The :class:`types.untyped.Set` class provides a ``remove`` method, but this only removes objects from the in-memory representation—it does not remove them from the file on disk or reclaim storage space. + +Alternative: PyNWB for advanced editing +---------------------------------------- +If you need more advanced editing capabilities that are not currently supported in MatNWB, consider using `PyNWB `_, which provides: -At the moment, MatNWB does not provide built-in functionality to copy data from one NWB file to another. However, you can achieve this by manually reading the desired data from the existing file and writing it to a new file using the appropriate MatNWB classes and methods. +- Editing dataset values and attributes in-place +- Renaming and moving groups and datasets -The following issues on GitHub track some of the limitations and potential improvements related to editing NWB files in MatNWB: +Files edited with PyNWB can be read back into MatNWB for further analysis. See the `PyNWB editing tutorial `_ for detailed examples. -- `MatNWB - Issue 751 `_ - Reexport datasets to another file -- `MatNWB - Issue 760 `_ - Edit data in place From 68d082e8c04b5c634e8f11c067a4e8378ef0dcd5 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Fri, 24 Oct 2025 18:14:58 +0200 Subject: [PATCH 64/67] Adjust introductions for index and overview pages --- docs/source/index.rst | 7 +++---- docs/source/pages/getting_started/overview.rst | 10 ++++------ 2 files changed, 7 insertions(+), 10 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index 1e72141d9..8c5fd19b8 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -4,17 +4,16 @@ NWB for MATLAB ############## -MatNWB_ is a MATLAB package for working with |NWB|_ (NWB) files. -It provides a high‑level, efficient interface for reading and writing neurophysiology data in the NWB format and includes tutorial Live Scripts that show you how to read NWB files or convert your own data to NWB. +MatNWB_ is a MATLAB package for working with |NWB|_ (NWB) files—a standardized format for neurophysiology data. The package provides a high‑level interface for reading and writing NWB files in MATLAB and includes tutorial Live Scripts that show you how to read existing NWB files or convert your own data to NWB. -This documentation focuses on MatNWB. If you are new to NWB or want to learn more about the format itself, these resources are a great starting point: +New to NWB? Learn more about the format itself from these resources: .. - :nwb_overview:`NWB Overview` | Placeholder - `NWB Overview Introduction `_: Entry point providing a high-level and general overview of the NWB format -- `NWB Format Specification `_: Detailed overview of the NWB Format and the neurodata type specifications that make up the format. +- `NWB Format Specification `_: Detailed overview of the NWB format and the neurodata type specifications that make up the format. For a quick introduction to MatNWB, go to the :ref:`Overview ` page. If you immediately want to see how to read or write files, take a look at the diff --git a/docs/source/pages/getting_started/overview.rst b/docs/source/pages/getting_started/overview.rst index 0aa72ba98..ad547e7f4 100644 --- a/docs/source/pages/getting_started/overview.rst +++ b/docs/source/pages/getting_started/overview.rst @@ -5,11 +5,7 @@ Overview ======== - -What is MatNWB? ---------------- - -MatNWB_ is a MATLAB package for reading, writing, and validating NWB files. It provides simple functions like :func:`nwbRead` and :func:`nwbExport` for file I/O, as well as a complete set of core neurodata and helper types represented using MATLAB classes. +MatNWB_ is a MATLAB package for working with NWB files. With MatNWB, you can read, write, and validate NWB files directly in MATLAB using intuitive functions like :func:`nwbRead` and :func:`nwbExport`, along with a comprehensive set of MATLAB classes representing neurodata types defined by the NWB schema. Who is it for? @@ -98,7 +94,9 @@ Important caveats when working with MatNWB: - **NWB schema versions**: When reading an NWB file, MatNWB will dynamically build class definitions for neurodata types from schemas that are embedded in the file. This ensures that the file is always represented correctly according to the schema version (and extensions) that was used when creating the file. However, the generated type classes will take the place of previously existing classes (i.e generated from different NWB versions), and therefore it is not recommended to work with NWB files of different NWB versions simultaneously. -- **Editing NWB files**: If you need to edit NWB files after creation, note that MatNWB currently has certain limitations. See the section on :ref:`Editing NWB files ` for more details. +.. + Todo: include this section after adding a how-to guide on editing NWB files (and potentially fixing current limitations/bugs) + - **Editing NWB files**: If you need to edit NWB files after creation, note that MatNWB currently has certain limitations. See the section on :ref:`Editing NWB files ` for more details. Related resources From adf10649d1e3ce2ebdef5c52b4d84a9e0568b5a8 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Fri, 24 Oct 2025 18:16:58 +0200 Subject: [PATCH 65/67] Remove page on editing NWB files --- docs/source/pages/concepts/file_create.rst | 5 +- .../file_create/editing_nwb_files.rst | 54 ------------------- 2 files changed, 4 insertions(+), 55 deletions(-) delete mode 100644 docs/source/pages/concepts/file_create/editing_nwb_files.rst diff --git a/docs/source/pages/concepts/file_create.rst b/docs/source/pages/concepts/file_create.rst index b1f930c0e..c58a4b2d4 100644 --- a/docs/source/pages/concepts/file_create.rst +++ b/docs/source/pages/concepts/file_create.rst @@ -40,5 +40,8 @@ If anything is missing or incorrect, you'll get an error message explaining what Understanding the NwbFile Object Storage Backends - Editing NWB Files Performance Optimization + +.. + Todo: include after creating a how-to guide for editing NWB files in MATLAB + Editing NWB Files diff --git a/docs/source/pages/concepts/file_create/editing_nwb_files.rst b/docs/source/pages/concepts/file_create/editing_nwb_files.rst deleted file mode 100644 index fe0b3aefc..000000000 --- a/docs/source/pages/concepts/file_create/editing_nwb_files.rst +++ /dev/null @@ -1,54 +0,0 @@ -.. _edit-nwb-files: - -Editing NWB files -================= - -After an NWB file has been exported to disk, it can be re-imported and edited. MatNWB supports **adding new data and metadata** to existing files, as well as **appending to extendable datasets**. This section provides guidance on working with existing NWB files in MatNWB and outlines current limitations. - -What MatNWB supports --------------------- - -Adding new data to existing files -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -MatNWB makes it straightforward to add new data and metadata to existing NWB files. Simply read the file, add your new content, and export it again. For example, you can add new time series, processing modules, or other neurodata objects to an existing file. - -Appending data to extendable datasets -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -You can append data to datasets that were created as extendable. :ref:`HDF5 ` datasets can be created with fixed dimensions or as extendable datasets. To make a dataset extendable, use the :class:`~types.untyped.DataPipe` class when initially creating the dataset, specifying the ``chunkSize`` and ``maxSize`` properties. - -The ``maxSize`` property determines the maximum size of the dataset. If you know the final size, set ``maxSize`` to optimize storage. For unlimited growth, set ``maxSize`` to ``Inf`` along one or more dimensions. - -For a detailed example of creating and using extendable datasets, see the :doc:`DataPipe tutorial `. - - -Known limitations in MatNWB ----------------------------- - -Some editing operations are not currently supported in MatNWB: - -**Editing data in-place** - Modifying existing dataset values after they have been written to disk is not currently supported. If you need to edit data values, you will need to create a new file and write corrected data. - - See `MatNWB Issue #760 `_ for discussion about this feature. - -**Appending to non-extendable datasets** - Datasets created without the :class:`~types.untyped.DataPipe` class have fixed dimensions and cannot be resized. Plan ahead by making datasets extendable if you anticipate needing to append data. - -**Removing data to reclaim disk space** - Due to :ref:`HDF5 ` limitations, removing datasets or attributes does not reclaim storage space. If you need to significantly restructure a file, create a new NWB file and copy the desired data into it. - - See `MatNWB Issue #751 `_ for progress on this feature. - -.. warning:: - The :class:`types.untyped.Set` class provides a ``remove`` method, but this only removes objects from the in-memory representation—it does not remove them from the file on disk or reclaim storage space. - -Alternative: PyNWB for advanced editing ----------------------------------------- - -If you need more advanced editing capabilities that are not currently supported in MatNWB, consider using `PyNWB `_, which provides: - -- Editing dataset values and attributes in-place -- Renaming and moving groups and datasets - -Files edited with PyNWB can be read back into MatNWB for further analysis. See the `PyNWB editing tutorial `_ for detailed examples. - From ff95a8440d3009509e864a9b6ee80c6db32f8b22 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Fri, 24 Oct 2025 18:31:16 +0200 Subject: [PATCH 66/67] Update nwbfile.rst --- docs/source/pages/concepts/file_create/nwbfile.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/pages/concepts/file_create/nwbfile.rst b/docs/source/pages/concepts/file_create/nwbfile.rst index 47a573241..81cfd16d8 100644 --- a/docs/source/pages/concepts/file_create/nwbfile.rst +++ b/docs/source/pages/concepts/file_create/nwbfile.rst @@ -45,7 +45,6 @@ MatNWB automatically handles some required NWB properties so you don't have to: Object Structure and Organization --------------------------------- -.. todo:: Link to NWB overview section on file structure here The :class:`NwbFile` object provides specific properties for organizing different types of data: @@ -64,7 +63,8 @@ The :class:`NwbFile` object provides specific properties for organizing differen **Additional metadata properties** Various ``general_*`` properties for experimenter, institution, lab, etc. - +.. TIP:: + For more details on where to place specific data types within the :class:`NwbFile` structure, refer to the :nwb_overview:`Anatomy of an NWB file ` section in the NWB Overview Docs. Validation and Error Handling ----------------------------- From 3eee88419511e73f12ba3ccca98e13d7421ed6bc Mon Sep 17 00:00:00 2001 From: ehennestad Date: Fri, 24 Oct 2025 18:43:02 +0200 Subject: [PATCH 67/67] Update storage_optimization.rst --- docs/source/pages/concepts/file_create/storage_optimization.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/pages/concepts/file_create/storage_optimization.rst b/docs/source/pages/concepts/file_create/storage_optimization.rst index 4cd3b442f..6eed38be6 100644 --- a/docs/source/pages/concepts/file_create/storage_optimization.rst +++ b/docs/source/pages/concepts/file_create/storage_optimization.rst @@ -19,7 +19,7 @@ A prerequisite for compression is chunking. Chunking is the partitioning of data For example, if you frequently read time series data in segments (e.g., 1-second windows), chunking along the time axis with a size that matches your typical read length can improve performance. Similarly, for image data, chunking in spatial blocks that align with common access patterns (e.g., tiles or frames) can be beneficial. -Further, the chunk size can impact compression efficiency. Larger chunks may yield better compression ratios, but can also increase memory usage during read/write operations. Conversely, smaller chunks may reduce memory overhead but could lead to less effective compression. For archival purposes, larger chunks are often preferred to maximize compression, while for interactive analysis, smaller chunks may be more suitable to optimize access speed. For online/cloud access, chunk sizes in the range of 2MB to 10MB are often recommended, as they balance the overhead of multiple HTTP requests with the latency of transferring large chunks. (HTTP requests have significantly higher overhead compared to local file access.) +Further, the chunk size can impact compression efficiency. Larger chunks may yield better compression ratios, but can also increase memory usage during read/write operations. Conversely, smaller chunks may reduce memory overhead but could lead to less effective compression. For archival purposes, larger chunks are often preferred to maximize compression, while for interactive analysis, smaller chunks may be more suitable to optimize access speed. For online/cloud access, chunk sizes in the range of 2MB to 10MB are often recommended (`Guide `_, `Presentation `_), as they balance the overhead of multiple HTTP requests with the latency of transferring large chunks. (HTTP requests have significantly higher overhead compared to local file access.) MatNWB configuration profiles