Skip to content

VideoFile import fails for ImageSeries with multiple external files #1445

@pauladkisson

Description

@pauladkisson

Is your feature request related to a problem? Please describe.

Spyglass's VideoFile table assumes a one-to-one relationship between an ImageSeries and a video file, but the NWB standard allows a single ImageSeries to reference multiple external files (e.g., one video file per epoch) using the starting_frame parameter to indicate which frames belong to which file. When an ImageSeries contains multiple external files, Spyglass does not recognize the starting_frame parameter and fails to import any of the video files.

The technical issue is that Spyglass checks if 90% of the ImageSeries timestamps overlap with the epoch's valid times. However, when multiple files are present with starting_frame, Spyglass treats all timestamps as belonging to a single video file, causing the overlap check to fail for all files.

This creates several problems:

  1. Video files not imported: When multiple video files are present with starting_frame, none of the video files are imported because the 90% overlap check fails for all epochs. While warnings are logged, users may not immediately understand why their video data wasn't imported.

  2. Incompatibility with standard NWB practice: The starting_frame parameter is part of the NWB standard for indicating which frames belong to which external file. Spyglass does not recognize this parameter, making it incompatible with standard multi-file ImageSeries structures.

  3. Uninformative warnings: While Spyglass does log warnings that "No video found corresponding to file X, epoch Y", these warnings don't explain that the root cause is Spyglass's lack of support for multiple external files in a single ImageSeries. Users are left to guess why their valid NWB structure fails to import.

  4. Lack of documentation: The current limitation is not documented, so users following standard NWB practices may be surprised when their multi-file ImageSeries fail to import without clear guidance on the required workaround.

Describe the solution you'd like

Minimum solution (must-have):

  • Add a specific warning when an ImageSeries contains multiple external files, such as:
    • "Spyglass does not support multiple external files in a single ImageSeries. Please reorganize your data into one external file per ImageSeries."
  • This warning should be raised in addition to the existing "No video found corresponding to file X, epoch Y" warnings

Intermediate solution (highly recommended):

  • Add documentation explaining the current one-to-one limitation between ImageSeries and VideoFile entries
  • Document the recommended workaround: create separate ImageSeries objects (one per video file)
  • Include examples showing how to structure NWB files for Spyglass compatibility

Optimal solution (ideal):

  • Support multiple VideoFile entries per ImageSeries by recognizing the starting_frame parameter
  • Create one database entry for each external file in the ImageSeries
  • Properly associate each video file with its corresponding epoch(s) based on the starting_frame indices and timestamps
  • Maintain backward compatibility with single-file ImageSeries while supporting the multi-file case

Describe alternatives you've considered

The current workaround is to create separate ImageSeries objects for each video file, even when they logically belong to the same recording session. While this workaround is functional, it:

  • Deviates from the NWB standard practice where a single ImageSeries can span multiple files using starting_frame
  • Requires users to be aware of this Spyglass-specific limitation, which is not currently documented

Additional context

Here is a minimal reproduction of the problem.

from pynwb.testing.mock.file import mock_NWBFile
from pynwb import NWBHDF5IO
from pathlib import Path
from ndx_franklab_novela import CameraDevice
from pynwb.image import ImageSeries
from pynwb.core import DynamicTable
import numpy as np


def add_task(nwbfile):
    tasks_module = nwbfile.create_processing_module(name="tasks", description="tasks module")
    for i in range(1, 4):
        task_table = DynamicTable(name=f"task_table_{i}", description=f"task table {i}")
        task_table.add_column(name="task_name", description="Name of the task.")
        task_table.add_column(name="task_description", description="Description of the task.")
        task_table.add_column(name="camera_id", description="Camera ID.")
        task_table.add_column(name="task_epochs", description="Task epochs.")
        task_table.add_row(
            task_name=f"task{i}", 
            task_description=f"task{i} description", 
            camera_id=[1], 
            task_epochs=[i]
        )
        tasks_module.add(task_table)


def add_video_with_multiple_files(nwbfile):
    camera_device = CameraDevice(
        name="camera_device 1",
        meters_per_pixel=1.0,
        model="my_model",
        lens="my_lens",
        camera_name="my_camera_name",
    )
    nwbfile.add_device(camera_device)

    video_files = [
        "/path/to/video_epoch1.h264",
        "/path/to/video_epoch2.h264", 
        "/path/to/video_epoch3.h264"
    ]
    timestamps = np.linspace(0, 30, 900)
    
    image_series = ImageSeries(
        name="my_image_series",
        description="Video recordings across multiple epochs",
        unit="n.a.",
        external_file=video_files,
        format="external",
        timestamps=timestamps,
        starting_frame=[0, 300, 600],
        device=camera_device,
    )
    nwbfile.add_acquisition(image_series)


def insert_session(nwbfile_path: Path):
    import datajoint as dj
    dj_local_conf_path = "/Users/pauladkisson/Documents/CatalystNeuro/Spyglass/spyglass/dj_local_conf.json"
    dj.config.load(dj_local_conf_path)
    import spyglass.common as sgc
    import spyglass.data_import as sgi
    from spyglass.utils.nwb_helper_fn import get_nwb_copy_filename
    
    nwb_copy_file_name = get_nwb_copy_filename(nwbfile_path.name)
    (sgc.Nwbfile & {"nwb_file_name": nwb_copy_file_name}).delete()
    sgi.insert_sessions(str(nwbfile_path), rollback_on_fail=True, raise_err=True)

    print(sgc.VideoFile())


def main():
    nwbfile = mock_NWBFile(
        identifier="multiple_video_files_bug_demo",
        session_description="Mock NWB file demonstrating Spyglass multiple video files issue"
    )
    nwbfile.add_epoch(start_time=0.0, stop_time=10.0, tags=["01"])
    nwbfile.add_epoch(start_time=10.0, stop_time=20.0, tags=["02"])
    nwbfile.add_epoch(start_time=20.0, stop_time=30.0, tags=["03"])
    nwbfile.create_processing_module(
        name="behavior", 
        description="Behavioral data including video"
    )
    add_task(nwbfile)
    add_video_with_multiple_files(nwbfile)
    
    nwbfile_path = Path("/Volumes/T7/CatalystNeuro/Spyglass/raw/mock_multiple_video_files.nwb")
    if nwbfile_path.exists():
        nwbfile_path.unlink()
    nwbfile_path.parent.mkdir(parents=True, exist_ok=True)
    
    with NWBHDF5IO(nwbfile_path, "w") as io:
        io.write(nwbfile)
    
    insert_session(nwbfile_path=nwbfile_path)


if __name__ == "__main__":
    main()

Behavior:

When running this script with an ImageSeries containing 3 external video files (with starting_frame=[0, 300, 600] to indicate which frames belong to which file), Spyglass creates 0 VideoFile entries instead of the expected 3:

Expected behavior (3 VideoFile entries, one per external file):

*nwb_file_name *epoch    *video_file_nu camera_name    video_file_obj
+------------+ +-------+ +------------+ +------------+ +------------+
mock_multiple_ 1         1              my_camera_name <UUID>
mock_multiple_ 2         2              my_camera_name <UUID>
mock_multiple_ 3         3              my_camera_name <UUID>
(Total: 3)

Actual behavior (0 VideoFile entries):

*nwb_file_name *epoch    *video_file_nu camera_name    video_file_obj
+------------+ +-------+ +------------+ +------------+ +------------+

(Total: 0)

Why this happens:

Spyglass checks if at least 90% of the ImageSeries timestamps overlap with each epoch's interval. However, because Spyglass doesn't recognize the starting_frame parameter, it treats all 900 timestamps (spanning 0-30s across all 3 epochs) as belonging to a single video file.

For each epoch (which is only 10 seconds):

  • Epoch 1 (0-10s): Only ~300 timestamps overlap (33% < 90% threshold) ❌
  • Epoch 2 (10-20s): Only ~300 timestamps overlap (33% < 90% threshold) ❌
  • Epoch 3 (20-30s): Only ~300 timestamps overlap (33% < 90% threshold) ❌

Since none of the epochs pass the 90% threshold, NO video files are imported at all. Spyglass does log warnings for each epoch:

[15:37:52][INFO] Spyglass: Populating VideoFile...
[15:37:52][INFO] Spyglass: No video found corresponding to file mock_multiple_video_files_.nwb, epoch 01
[15:37:52][INFO] Spyglass: No video found corresponding to file mock_multiple_video_files_.nwb, epoch 02
[15:37:52][INFO] Spyglass: No video found corresponding to file mock_multiple_video_files_.nwb, epoch 03

However, these warnings don't explain WHY no video was found - specifically, that the 90% timestamp overlap threshold wasn't met because the starting_frame parameter was ignored. Without understanding the root cause, users are left confused about why their valid NWB ImageSeries with video data fails to import.

Metadata

Metadata

Assignees

Labels

NWB ingestionProblems with loading nwb files into spyglassbugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions