Documentation: Improvements to "Reading NWB Files" section (#769)

ehennestad · web-flow · commit 24f8b445c684 · 2025-11-05T07:11:38.000+01:00
* Update nwbRead function reference formatting in docs

Changed the reference to nwbRead from inline code to Sphinx function markup (:func:`nwbRead`)

* Update NwbFile reference to use Sphinx class directive

Changed the reference to the returned NwbFile object from inline code to Sphinx's :class: directive.

* Revise lazy loading explanation in file_read docs

Updated the description of lazy loading in MatNWB to clarify that only file structure and metadata are loaded initially when reading an NWB file. The new explanation emphasizes efficient handling of large files and directs users to the DataStubs and DataPipes section for details on loading dataset contents.

* Expand documentation for DataStubs and DataPipes

The untyped utility types documentation now provides detailed explanations of DataStubs and DataPipes, including their roles in efficient data access and writing for large NWB datasets. Key characteristics, usage scenarios, and references to further tutorials have been added for clarity.

* Update untyped.rst

* Update untyped.rst

* Move paths-ignore from PR event to push event

PR requires tests to pass, so tests should always run on PR
diff --git a/.github/workflows/run_tests.yml b/.github/workflows/run_tests.yml
@@ -5,15 +5,15 @@ on:
     types: [opened, synchronize, reopened, ready_for_review]
     branches:
       - main
+  push:
+    branches:
+      - main
     paths-ignore:
       - "*.md"
       - "*.codespellrc"
       - ".github/**"
       - "!.github/workflows/run_tests.yml"
       - "docs/**"
-  push:
-    branches:
-      - main
   workflow_dispatch:
     inputs:
       test_all_matlab_releases:
diff --git a/docs/source/pages/concepts/file_read.rst b/docs/source/pages/concepts/file_read.rst
@@ -15,17 +15,12 @@ This command performs several important tasks behind the scenes:
 2. **Automatically generates MATLAB classes** needed to work with the data  
 3. **Returns an NwbFile object** representing the entire file  
 
-The returned `NwbFile` object is the primary access point for all the data in the file. In the :ref:`next section<matnwb-read-nwbfile-intro>`, we will examine the structure of this object in detail, covering how to explore it using standard MATLAB dot notation to access experimental metadata, raw recordings, processed data, and analysis results, as well as how to search for specific data types.
+The returned :class:`NwbFile` object is the primary access point for all the data in the file. In the :ref:`next section<matnwb-read-nwbfile-intro>`, we will examine the structure of this object in detail, covering how to explore it using standard MATLAB dot notation to access experimental metadata, raw recordings, processed data, and analysis results, as well as how to search for specific data types.
 
 .. important::
-    **Lazy Loading:** MatNWB uses lazy reading to efficiently work with large datasets. When you access a dataset through the `NwbFile` object, MatNWB returns a :class:`types.untyped.DataStub` object instead of loading the entire dataset into memory. This allows you to:
-    
-    - Work with files larger than available RAM
-    - Read only the portions of data you need
-    - Index into datasets using standard MATLAB array syntax
-    - Load the full dataset explicitly using the ``.load()`` method
-    
-    For more details, see :ref:`DataStubs and DataPipes<matnwb-read-untyped-datastub-datapipe>`.
+    **Lazy Loading:** MatNWB uses lazy reading to efficiently handle large files. When you read an NWB file using :func:`nwbRead`, only the file structure and metadata are initially loaded into memory. This approach enables quick access to the file’s contents and makes it possible to work with files larger than the system’s available RAM.
+
+    To learn how to load data from non-scalar or multidimensional datasets into memory, see :ref:`DataStubs and DataPipes<matnwb-read-untyped-datastub-datapipe>`.
 
 .. note::
     The :func:`nwbRead` function currently does not support reading NWB files stored in Zarr format.
diff --git a/docs/source/pages/concepts/file_read/schemas_and_generation.rst b/docs/source/pages/concepts/file_read/schemas_and_generation.rst
@@ -31,7 +31,7 @@ You can check a file's schema version:
 How MatNWB Generates Classes
 ----------------------------
 
-When you call ``nwbRead``, MatNWB performs several steps behind the scenes:
+When you call :func:`nwbRead`, MatNWB performs several steps behind the scenes:
 
 1. **Reads the file's embedded schema** information
 2. **Generates MATLAB classes** for neurodata types defined by the schema version used to create the file
diff --git a/docs/source/pages/concepts/file_read/untyped.rst b/docs/source/pages/concepts/file_read/untyped.rst
@@ -7,7 +7,6 @@ Utility Types in MatNWB
 
     Documentation for "untyped" types will be added soon
 
-
 "Untyped" Utility types are tools which allow for both flexibility as well as limiting certain constraints that are imposed by the NWB schema. These types are commonly stored in the ``+types/+untyped/`` package directories in your MatNWB installation.
 
 .. _matnwb-read-untyped-sets-anons:
@@ -36,13 +35,44 @@ The **Anon** type (``types.untyped.Anon``) can be understood as a Set type with
 DataStubs and DataPipes
 ~~~~~~~~~~~~~~~~~~~~~~~
 
-**DataStubs** serves as a read-only link to your data. It allows for MATLAB-style indexing to retrieve the data stored on disk.
+When working with NWB files, datasets can be very large (gigabytes or more). Loading all this data into memory at once would be impractical or impossible. MatNWB uses two types to handle on-disk data efficiently: **DataStubs** and **DataPipes**.
+
+DataStubs (Read only)
+^^^^^^^^^^^^^^^^^^^^^
+
+A **DataStub** (``types.untyped.DataStub``) represents a read-only reference to data stored in an NWB file. When you read an NWB file, non-scalar and multi-dimensional datasets are automatically represented as DataStubs rather than loaded into memory.
 
 .. image:: https://github.com/NeurodataWithoutBorders/nwb-overview/blob/main/docs/source/img/matnwb_datastub.png?raw=true
 
+Key characteristics:
+
+- **Lazy loading**: Data remains on disk until you explicitly access it
+- **Memory efficient**: Only the portions you request are loaded
+- **MATLAB-style indexing**: Access data using familiar syntax like ``dataStub(1:100, :)``
+- **Read-only**: Cannot be used to modify or write data
+
+You'll encounter DataStubs whenever you read existing NWB files containing non-scalar or multi-dimensional datasets.
+
+DataPipes (read and write)
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A **DataPipe** (``types.untyped.DataPipe``) extends the concept of lazy data access to support **writing** as well as reading. While DataStubs are created automatically when reading files, you create DataPipes explicitly when writing data.
+
+Key characteristics:
+
+- **Bidirectional**: Supports both reading and writing operations
+- **Incremental writing**: Stream data to disk in chunks rather than all at once
+- **Compression support**: Apply HDF5 compression and chunking strategies
+- **Write optimization**: Configure how data is stored on disk for better performance
+
+DataPipes solve the problem of writing datasets that are too large to fit in memory, or when you want fine-grained control over how data is stored in the HDF5 file.
+
+.. seealso::
 
-**DataPipes** are similar to DataStubs in that they allow you to load data from disk; however, they also provide a wide array of features that allow the user to write data to disk, either by streaming parts of data in at a time or by compressing the data before writing. The DataPipe is an advanced type and users looking to leverage DataPipe's capabilities to stream/iteratively write or compress data should read the :doc:`Advanced Data Write Tutorial </pages/tutorials/dataPipe>`
+   - For detailed guidance on creating and configuring DataPipes, see :doc:`Advanced Data Write Tutorial </pages/tutorials/dataPipe>`
 
+.. todo
+    - For practical examples of reading data via DataStubs, see :ref:`How to Read Data </pages/how-to/read-on-demand>`
 
 .. _matnwb-read-untyped-links-views: