Skip to content

Commit 24f8b44

Browse files
authored
Documentation: Improvements to "Reading NWB Files" section (#769)
* Update nwbRead function reference formatting in docs Changed the reference to nwbRead from inline code to Sphinx function markup (:func:`nwbRead`) * Update NwbFile reference to use Sphinx class directive Changed the reference to the returned NwbFile object from inline code to Sphinx's :class: directive. * Revise lazy loading explanation in file_read docs Updated the description of lazy loading in MatNWB to clarify that only file structure and metadata are loaded initially when reading an NWB file. The new explanation emphasizes efficient handling of large files and directs users to the DataStubs and DataPipes section for details on loading dataset contents. * Expand documentation for DataStubs and DataPipes The untyped utility types documentation now provides detailed explanations of DataStubs and DataPipes, including their roles in efficient data access and writing for large NWB datasets. Key characteristics, usage scenarios, and references to further tutorials have been added for clarity. * Update untyped.rst * Update untyped.rst * Move paths-ignore from PR event to push event PR requires tests to pass, so tests should always run on PR
1 parent 6cb513a commit 24f8b44

File tree

4 files changed

+41
-16
lines changed

4 files changed

+41
-16
lines changed

.github/workflows/run_tests.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,15 @@ on:
55
types: [opened, synchronize, reopened, ready_for_review]
66
branches:
77
- main
8+
push:
9+
branches:
10+
- main
811
paths-ignore:
912
- "*.md"
1013
- "*.codespellrc"
1114
- ".github/**"
1215
- "!.github/workflows/run_tests.yml"
1316
- "docs/**"
14-
push:
15-
branches:
16-
- main
1717
workflow_dispatch:
1818
inputs:
1919
test_all_matlab_releases:

docs/source/pages/concepts/file_read.rst

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -15,17 +15,12 @@ This command performs several important tasks behind the scenes:
1515
2. **Automatically generates MATLAB classes** needed to work with the data
1616
3. **Returns an NwbFile object** representing the entire file
1717

18-
The returned `NwbFile` object is the primary access point for all the data in the file. In the :ref:`next section<matnwb-read-nwbfile-intro>`, we will examine the structure of this object in detail, covering how to explore it using standard MATLAB dot notation to access experimental metadata, raw recordings, processed data, and analysis results, as well as how to search for specific data types.
18+
The returned :class:`NwbFile` object is the primary access point for all the data in the file. In the :ref:`next section<matnwb-read-nwbfile-intro>`, we will examine the structure of this object in detail, covering how to explore it using standard MATLAB dot notation to access experimental metadata, raw recordings, processed data, and analysis results, as well as how to search for specific data types.
1919

2020
.. important::
21-
**Lazy Loading:** MatNWB uses lazy reading to efficiently work with large datasets. When you access a dataset through the `NwbFile` object, MatNWB returns a :class:`types.untyped.DataStub` object instead of loading the entire dataset into memory. This allows you to:
22-
23-
- Work with files larger than available RAM
24-
- Read only the portions of data you need
25-
- Index into datasets using standard MATLAB array syntax
26-
- Load the full dataset explicitly using the ``.load()`` method
27-
28-
For more details, see :ref:`DataStubs and DataPipes<matnwb-read-untyped-datastub-datapipe>`.
21+
**Lazy Loading:** MatNWB uses lazy reading to efficiently handle large files. When you read an NWB file using :func:`nwbRead`, only the file structure and metadata are initially loaded into memory. This approach enables quick access to the file’s contents and makes it possible to work with files larger than the system’s available RAM.
22+
23+
To learn how to load data from non-scalar or multidimensional datasets into memory, see :ref:`DataStubs and DataPipes<matnwb-read-untyped-datastub-datapipe>`.
2924

3025
.. note::
3126
The :func:`nwbRead` function currently does not support reading NWB files stored in Zarr format.

docs/source/pages/concepts/file_read/schemas_and_generation.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ You can check a file's schema version:
3131
How MatNWB Generates Classes
3232
----------------------------
3333

34-
When you call ``nwbRead``, MatNWB performs several steps behind the scenes:
34+
When you call :func:`nwbRead`, MatNWB performs several steps behind the scenes:
3535

3636
1. **Reads the file's embedded schema** information
3737
2. **Generates MATLAB classes** for neurodata types defined by the schema version used to create the file

docs/source/pages/concepts/file_read/untyped.rst

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ Utility Types in MatNWB
77

88
Documentation for "untyped" types will be added soon
99

10-
1110
"Untyped" Utility types are tools which allow for both flexibility as well as limiting certain constraints that are imposed by the NWB schema. These types are commonly stored in the ``+types/+untyped/`` package directories in your MatNWB installation.
1211

1312
.. _matnwb-read-untyped-sets-anons:
@@ -36,13 +35,44 @@ The **Anon** type (``types.untyped.Anon``) can be understood as a Set type with
3635
DataStubs and DataPipes
3736
~~~~~~~~~~~~~~~~~~~~~~~
3837

39-
**DataStubs** serves as a read-only link to your data. It allows for MATLAB-style indexing to retrieve the data stored on disk.
38+
When working with NWB files, datasets can be very large (gigabytes or more). Loading all this data into memory at once would be impractical or impossible. MatNWB uses two types to handle on-disk data efficiently: **DataStubs** and **DataPipes**.
39+
40+
DataStubs (Read only)
41+
^^^^^^^^^^^^^^^^^^^^^
42+
43+
A **DataStub** (``types.untyped.DataStub``) represents a read-only reference to data stored in an NWB file. When you read an NWB file, non-scalar and multi-dimensional datasets are automatically represented as DataStubs rather than loaded into memory.
4044

4145
.. image:: https://github.com/NeurodataWithoutBorders/nwb-overview/blob/main/docs/source/img/matnwb_datastub.png?raw=true
4246

47+
Key characteristics:
48+
49+
- **Lazy loading**: Data remains on disk until you explicitly access it
50+
- **Memory efficient**: Only the portions you request are loaded
51+
- **MATLAB-style indexing**: Access data using familiar syntax like ``dataStub(1:100, :)``
52+
- **Read-only**: Cannot be used to modify or write data
53+
54+
You'll encounter DataStubs whenever you read existing NWB files containing non-scalar or multi-dimensional datasets.
55+
56+
DataPipes (read and write)
57+
^^^^^^^^^^^^^^^^^^^^^^^^^^
58+
59+
A **DataPipe** (``types.untyped.DataPipe``) extends the concept of lazy data access to support **writing** as well as reading. While DataStubs are created automatically when reading files, you create DataPipes explicitly when writing data.
60+
61+
Key characteristics:
62+
63+
- **Bidirectional**: Supports both reading and writing operations
64+
- **Incremental writing**: Stream data to disk in chunks rather than all at once
65+
- **Compression support**: Apply HDF5 compression and chunking strategies
66+
- **Write optimization**: Configure how data is stored on disk for better performance
67+
68+
DataPipes solve the problem of writing datasets that are too large to fit in memory, or when you want fine-grained control over how data is stored in the HDF5 file.
69+
70+
.. seealso::
4371

44-
**DataPipes** are similar to DataStubs in that they allow you to load data from disk; however, they also provide a wide array of features that allow the user to write data to disk, either by streaming parts of data in at a time or by compressing the data before writing. The DataPipe is an advanced type and users looking to leverage DataPipe's capabilities to stream/iteratively write or compress data should read the :doc:`Advanced Data Write Tutorial </pages/tutorials/dataPipe>`
72+
- For detailed guidance on creating and configuring DataPipes, see :doc:`Advanced Data Write Tutorial </pages/tutorials/dataPipe>`
4573

74+
.. todo
75+
- For practical examples of reading data via DataStubs, see :ref:`How to Read Data </pages/how-to/read-on-demand>`
4676
4777
.. _matnwb-read-untyped-links-views:
4878

0 commit comments

Comments
 (0)