diff --git a/.github/workflows/run_tests.yml b/.github/workflows/run_tests.yml index 0b7a42e7..8b6bae16 100644 --- a/.github/workflows/run_tests.yml +++ b/.github/workflows/run_tests.yml @@ -5,15 +5,15 @@ on: types: [opened, synchronize, reopened, ready_for_review] branches: - main + push: + branches: + - main paths-ignore: - "*.md" - "*.codespellrc" - ".github/**" - "!.github/workflows/run_tests.yml" - "docs/**" - push: - branches: - - main workflow_dispatch: inputs: test_all_matlab_releases: diff --git a/docs/source/pages/concepts/file_read.rst b/docs/source/pages/concepts/file_read.rst index 9f49af2c..5dcd5cce 100644 --- a/docs/source/pages/concepts/file_read.rst +++ b/docs/source/pages/concepts/file_read.rst @@ -15,17 +15,12 @@ This command performs several important tasks behind the scenes: 2. **Automatically generates MATLAB classes** needed to work with the data 3. **Returns an NwbFile object** representing the entire file -The returned `NwbFile` object is the primary access point for all the data in the file. In the :ref:`next section`, we will examine the structure of this object in detail, covering how to explore it using standard MATLAB dot notation to access experimental metadata, raw recordings, processed data, and analysis results, as well as how to search for specific data types. +The returned :class:`NwbFile` object is the primary access point for all the data in the file. In the :ref:`next section`, we will examine the structure of this object in detail, covering how to explore it using standard MATLAB dot notation to access experimental metadata, raw recordings, processed data, and analysis results, as well as how to search for specific data types. .. important:: - **Lazy Loading:** MatNWB uses lazy reading to efficiently work with large datasets. When you access a dataset through the `NwbFile` object, MatNWB returns a :class:`types.untyped.DataStub` object instead of loading the entire dataset into memory. This allows you to: - - - Work with files larger than available RAM - - Read only the portions of data you need - - Index into datasets using standard MATLAB array syntax - - Load the full dataset explicitly using the ``.load()`` method - - For more details, see :ref:`DataStubs and DataPipes`. + **Lazy Loading:** MatNWB uses lazy reading to efficiently handle large files. When you read an NWB file using :func:`nwbRead`, only the file structure and metadata are initially loaded into memory. This approach enables quick access to the file’s contents and makes it possible to work with files larger than the system’s available RAM. + + To learn how to load data from non-scalar or multidimensional datasets into memory, see :ref:`DataStubs and DataPipes`. .. note:: The :func:`nwbRead` function currently does not support reading NWB files stored in Zarr format. diff --git a/docs/source/pages/concepts/file_read/schemas_and_generation.rst b/docs/source/pages/concepts/file_read/schemas_and_generation.rst index 8b9fb90d..c2396fd0 100644 --- a/docs/source/pages/concepts/file_read/schemas_and_generation.rst +++ b/docs/source/pages/concepts/file_read/schemas_and_generation.rst @@ -31,7 +31,7 @@ You can check a file's schema version: How MatNWB Generates Classes ---------------------------- -When you call ``nwbRead``, MatNWB performs several steps behind the scenes: +When you call :func:`nwbRead`, MatNWB performs several steps behind the scenes: 1. **Reads the file's embedded schema** information 2. **Generates MATLAB classes** for neurodata types defined by the schema version used to create the file diff --git a/docs/source/pages/concepts/file_read/untyped.rst b/docs/source/pages/concepts/file_read/untyped.rst index 9b7ef164..d84898ef 100644 --- a/docs/source/pages/concepts/file_read/untyped.rst +++ b/docs/source/pages/concepts/file_read/untyped.rst @@ -7,7 +7,6 @@ Utility Types in MatNWB Documentation for "untyped" types will be added soon - "Untyped" Utility types are tools which allow for both flexibility as well as limiting certain constraints that are imposed by the NWB schema. These types are commonly stored in the ``+types/+untyped/`` package directories in your MatNWB installation. .. _matnwb-read-untyped-sets-anons: @@ -36,13 +35,44 @@ The **Anon** type (``types.untyped.Anon``) can be understood as a Set type with DataStubs and DataPipes ~~~~~~~~~~~~~~~~~~~~~~~ -**DataStubs** serves as a read-only link to your data. It allows for MATLAB-style indexing to retrieve the data stored on disk. +When working with NWB files, datasets can be very large (gigabytes or more). Loading all this data into memory at once would be impractical or impossible. MatNWB uses two types to handle on-disk data efficiently: **DataStubs** and **DataPipes**. + +DataStubs (Read only) +^^^^^^^^^^^^^^^^^^^^^ + +A **DataStub** (``types.untyped.DataStub``) represents a read-only reference to data stored in an NWB file. When you read an NWB file, non-scalar and multi-dimensional datasets are automatically represented as DataStubs rather than loaded into memory. .. image:: https://github.com/NeurodataWithoutBorders/nwb-overview/blob/main/docs/source/img/matnwb_datastub.png?raw=true +Key characteristics: + +- **Lazy loading**: Data remains on disk until you explicitly access it +- **Memory efficient**: Only the portions you request are loaded +- **MATLAB-style indexing**: Access data using familiar syntax like ``dataStub(1:100, :)`` +- **Read-only**: Cannot be used to modify or write data + +You'll encounter DataStubs whenever you read existing NWB files containing non-scalar or multi-dimensional datasets. + +DataPipes (read and write) +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A **DataPipe** (``types.untyped.DataPipe``) extends the concept of lazy data access to support **writing** as well as reading. While DataStubs are created automatically when reading files, you create DataPipes explicitly when writing data. + +Key characteristics: + +- **Bidirectional**: Supports both reading and writing operations +- **Incremental writing**: Stream data to disk in chunks rather than all at once +- **Compression support**: Apply HDF5 compression and chunking strategies +- **Write optimization**: Configure how data is stored on disk for better performance + +DataPipes solve the problem of writing datasets that are too large to fit in memory, or when you want fine-grained control over how data is stored in the HDF5 file. + +.. seealso:: -**DataPipes** are similar to DataStubs in that they allow you to load data from disk; however, they also provide a wide array of features that allow the user to write data to disk, either by streaming parts of data in at a time or by compressing the data before writing. The DataPipe is an advanced type and users looking to leverage DataPipe's capabilities to stream/iteratively write or compress data should read the :doc:`Advanced Data Write Tutorial ` + - For detailed guidance on creating and configuring DataPipes, see :doc:`Advanced Data Write Tutorial ` +.. todo + - For practical examples of reading data via DataStubs, see :ref:`How to Read Data ` .. _matnwb-read-untyped-links-views: