From 3c498c7f9b37d23434de29192e807586c0405d14 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Tue, 4 Nov 2025 16:57:04 +0100 Subject: [PATCH 1/7] Update nwbRead function reference formatting in docs Changed the reference to nwbRead from inline code to Sphinx function markup (:func:`nwbRead`) --- docs/source/pages/concepts/file_read/schemas_and_generation.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/pages/concepts/file_read/schemas_and_generation.rst b/docs/source/pages/concepts/file_read/schemas_and_generation.rst index 8b9fb90d..c2396fd0 100644 --- a/docs/source/pages/concepts/file_read/schemas_and_generation.rst +++ b/docs/source/pages/concepts/file_read/schemas_and_generation.rst @@ -31,7 +31,7 @@ You can check a file's schema version: How MatNWB Generates Classes ---------------------------- -When you call ``nwbRead``, MatNWB performs several steps behind the scenes: +When you call :func:`nwbRead`, MatNWB performs several steps behind the scenes: 1. **Reads the file's embedded schema** information 2. **Generates MATLAB classes** for neurodata types defined by the schema version used to create the file From c519ae00b7d6e0ba32c0d15af59924044016dba5 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Tue, 4 Nov 2025 16:58:30 +0100 Subject: [PATCH 2/7] Update NwbFile reference to use Sphinx class directive Changed the reference to the returned NwbFile object from inline code to Sphinx's :class: directive. --- docs/source/pages/concepts/file_read.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/pages/concepts/file_read.rst b/docs/source/pages/concepts/file_read.rst index 9f49af2c..12375d8e 100644 --- a/docs/source/pages/concepts/file_read.rst +++ b/docs/source/pages/concepts/file_read.rst @@ -15,7 +15,7 @@ This command performs several important tasks behind the scenes: 2. **Automatically generates MATLAB classes** needed to work with the data 3. **Returns an NwbFile object** representing the entire file -The returned `NwbFile` object is the primary access point for all the data in the file. In the :ref:`next section`, we will examine the structure of this object in detail, covering how to explore it using standard MATLAB dot notation to access experimental metadata, raw recordings, processed data, and analysis results, as well as how to search for specific data types. +The returned :class:`NwbFile` object is the primary access point for all the data in the file. In the :ref:`next section`, we will examine the structure of this object in detail, covering how to explore it using standard MATLAB dot notation to access experimental metadata, raw recordings, processed data, and analysis results, as well as how to search for specific data types. .. important:: **Lazy Loading:** MatNWB uses lazy reading to efficiently work with large datasets. When you access a dataset through the `NwbFile` object, MatNWB returns a :class:`types.untyped.DataStub` object instead of loading the entire dataset into memory. This allows you to: From 1275b3fe28bb965ecebeb0cd46bd3f2b8daf619e Mon Sep 17 00:00:00 2001 From: ehennestad Date: Tue, 4 Nov 2025 20:08:19 +0100 Subject: [PATCH 3/7] Revise lazy loading explanation in file_read docs Updated the description of lazy loading in MatNWB to clarify that only file structure and metadata are loaded initially when reading an NWB file. The new explanation emphasizes efficient handling of large files and directs users to the DataStubs and DataPipes section for details on loading dataset contents. --- docs/source/pages/concepts/file_read.rst | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/docs/source/pages/concepts/file_read.rst b/docs/source/pages/concepts/file_read.rst index 12375d8e..5dcd5cce 100644 --- a/docs/source/pages/concepts/file_read.rst +++ b/docs/source/pages/concepts/file_read.rst @@ -18,14 +18,9 @@ This command performs several important tasks behind the scenes: The returned :class:`NwbFile` object is the primary access point for all the data in the file. In the :ref:`next section`, we will examine the structure of this object in detail, covering how to explore it using standard MATLAB dot notation to access experimental metadata, raw recordings, processed data, and analysis results, as well as how to search for specific data types. .. important:: - **Lazy Loading:** MatNWB uses lazy reading to efficiently work with large datasets. When you access a dataset through the `NwbFile` object, MatNWB returns a :class:`types.untyped.DataStub` object instead of loading the entire dataset into memory. This allows you to: - - - Work with files larger than available RAM - - Read only the portions of data you need - - Index into datasets using standard MATLAB array syntax - - Load the full dataset explicitly using the ``.load()`` method - - For more details, see :ref:`DataStubs and DataPipes`. + **Lazy Loading:** MatNWB uses lazy reading to efficiently handle large files. When you read an NWB file using :func:`nwbRead`, only the file structure and metadata are initially loaded into memory. This approach enables quick access to the file’s contents and makes it possible to work with files larger than the system’s available RAM. + + To learn how to load data from non-scalar or multidimensional datasets into memory, see :ref:`DataStubs and DataPipes`. .. note:: The :func:`nwbRead` function currently does not support reading NWB files stored in Zarr format. From 976a63c26e8d6472f90b912189327e7d2196ed93 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Tue, 4 Nov 2025 20:25:26 +0100 Subject: [PATCH 4/7] Expand documentation for DataStubs and DataPipes The untyped utility types documentation now provides detailed explanations of DataStubs and DataPipes, including their roles in efficient data access and writing for large NWB datasets. Key characteristics, usage scenarios, and references to further tutorials have been added for clarity. --- .../pages/concepts/file_read/untyped.rst | 36 +++++++++++++++++-- 1 file changed, 33 insertions(+), 3 deletions(-) diff --git a/docs/source/pages/concepts/file_read/untyped.rst b/docs/source/pages/concepts/file_read/untyped.rst index 9b7ef164..bdf48db6 100644 --- a/docs/source/pages/concepts/file_read/untyped.rst +++ b/docs/source/pages/concepts/file_read/untyped.rst @@ -7,7 +7,6 @@ Utility Types in MatNWB Documentation for "untyped" types will be added soon - "Untyped" Utility types are tools which allow for both flexibility as well as limiting certain constraints that are imposed by the NWB schema. These types are commonly stored in the ``+types/+untyped/`` package directories in your MatNWB installation. .. _matnwb-read-untyped-sets-anons: @@ -36,13 +35,44 @@ The **Anon** type (``types.untyped.Anon``) can be understood as a Set type with DataStubs and DataPipes ~~~~~~~~~~~~~~~~~~~~~~~ -**DataStubs** serves as a read-only link to your data. It allows for MATLAB-style indexing to retrieve the data stored on disk. +When working with NWB files, datasets can be very large (gigabytes or more). Loading all this data into memory at once would be impractical or impossible. MatNWB uses two types to handle on-disk data efficiently: **DataStubs** and **DataPipes**. + +DataStubs (Read only) +^^^^^^^^^^^^^^^^^^^^^ + +A **DataStub** (``types.untyped.DataStub``) represents a read-only reference to data stored in an NWB file. When you read an NWB file, large datasets are automatically represented as DataStubs rather than loaded into memory. .. image:: https://github.com/NeurodataWithoutBorders/nwb-overview/blob/main/docs/source/img/matnwb_datastub.png?raw=true +Key characteristics: + +- **Lazy loading**: Data remains on disk until you explicitly access it +- **Memory efficient**: Only the portions you request are loaded +- **MATLAB-style indexing**: Access data using familiar syntax like ``dataStub(1:100, :)`` +- **Read-only**: Cannot be used to modify or write data + +You'll encounter DataStubs whenever you read existing NWB files containing non-scalar or multi-dimensional datasets. + +DataPipes (read and write) +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A **DataPipe** (``types.untyped.DataPipe``) extends the concept of lazy data access to support **writing** as well as reading. While DataStubs are created automatically when reading files, you create DataPipes explicitly when writing data. + +Key characteristics: + +- **Bidirectional**: Supports both reading and writing operations +- **Incremental writing**: Stream data to disk in chunks rather than all at once +- **Compression support**: Apply HDF5 compression and chunking strategies +- **Write optimization**: Configure how data is stored on disk for better performance + +DataPipes solve the problem of writing datasets that are too large to fit in memory, or when you want fine-grained control over how data is stored in the HDF5 file. + +.. seealso:: -**DataPipes** are similar to DataStubs in that they allow you to load data from disk; however, they also provide a wide array of features that allow the user to write data to disk, either by streaming parts of data in at a time or by compressing the data before writing. The DataPipe is an advanced type and users looking to leverage DataPipe's capabilities to stream/iteratively write or compress data should read the :doc:`Advanced Data Write Tutorial ` + - For detailed guidance on creating and configuring DataPipes, see :ref:`Advanced Data Write Tutorial ` + .. todo + - For practical examples of reading data via DataStubs, see :ref:`How to Read Data ` .. _matnwb-read-untyped-links-views: From 934fa630402a77f8fd407e5f9b242f6af5d765b1 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Tue, 4 Nov 2025 20:31:18 +0100 Subject: [PATCH 5/7] Update untyped.rst --- docs/source/pages/concepts/file_read/untyped.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/pages/concepts/file_read/untyped.rst b/docs/source/pages/concepts/file_read/untyped.rst index bdf48db6..73e0dc92 100644 --- a/docs/source/pages/concepts/file_read/untyped.rst +++ b/docs/source/pages/concepts/file_read/untyped.rst @@ -69,10 +69,10 @@ DataPipes solve the problem of writing datasets that are too large to fit in mem .. seealso:: - - For detailed guidance on creating and configuring DataPipes, see :ref:`Advanced Data Write Tutorial ` + - For detailed guidance on creating and configuring DataPipes, see :doc:`Advanced Data Write Tutorial ` - .. todo - - For practical examples of reading data via DataStubs, see :ref:`How to Read Data ` +.. todo + - For practical examples of reading data via DataStubs, see :ref:`How to Read Data ` .. _matnwb-read-untyped-links-views: From 2b873b37e5d50fa1b9cdc5229f64d15644254fc2 Mon Sep 17 00:00:00 2001 From: ehennestad Date: Tue, 4 Nov 2025 21:16:13 +0100 Subject: [PATCH 6/7] Update untyped.rst --- docs/source/pages/concepts/file_read/untyped.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/pages/concepts/file_read/untyped.rst b/docs/source/pages/concepts/file_read/untyped.rst index 73e0dc92..d84898ef 100644 --- a/docs/source/pages/concepts/file_read/untyped.rst +++ b/docs/source/pages/concepts/file_read/untyped.rst @@ -40,7 +40,7 @@ When working with NWB files, datasets can be very large (gigabytes or more). Loa DataStubs (Read only) ^^^^^^^^^^^^^^^^^^^^^ -A **DataStub** (``types.untyped.DataStub``) represents a read-only reference to data stored in an NWB file. When you read an NWB file, large datasets are automatically represented as DataStubs rather than loaded into memory. +A **DataStub** (``types.untyped.DataStub``) represents a read-only reference to data stored in an NWB file. When you read an NWB file, non-scalar and multi-dimensional datasets are automatically represented as DataStubs rather than loaded into memory. .. image:: https://github.com/NeurodataWithoutBorders/nwb-overview/blob/main/docs/source/img/matnwb_datastub.png?raw=true From e11b3f8eadcddb51624852c55c0a084545a4fc4e Mon Sep 17 00:00:00 2001 From: ehennestad Date: Tue, 4 Nov 2025 21:34:16 +0100 Subject: [PATCH 7/7] Move paths-ignore from PR event to push event PR requires tests to pass, so tests should always run on PR --- .github/workflows/run_tests.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/run_tests.yml b/.github/workflows/run_tests.yml index 0b7a42e7..8b6bae16 100644 --- a/.github/workflows/run_tests.yml +++ b/.github/workflows/run_tests.yml @@ -5,15 +5,15 @@ on: types: [opened, synchronize, reopened, ready_for_review] branches: - main + push: + branches: + - main paths-ignore: - "*.md" - "*.codespellrc" - ".github/**" - "!.github/workflows/run_tests.yml" - "docs/**" - push: - branches: - - main workflow_dispatch: inputs: test_all_matlab_releases: