Skip to content

Video files silently fail to import (from NWB) without task information #1444

@pauladkisson

Description

@pauladkisson

Is your feature request related to a problem? Please describe.

Spyglass's video data import process has an unwanted dependency on task/epoch information, preventing users from inserting video data independently. This coupling creates several issues:

  1. Prevents modular development: Users cannot insert video data without first creating task/epoch information, even when the video data is unrelated to specific experimental tasks or when task information is still being developed.

  2. Makes debugging difficult: When troubleshooting video import issues, users must also maintain task/epoch data structures, complicating the debugging process and making it harder to isolate video-specific problems.

  3. Reduces flexibility: In many experimental scenarios, video recordings may exist independently of task paradigms (e.g., continuous monitoring, exploratory behavior sessions, or video collected during inter-trial intervals).

This tight coupling between video and task data is not required by the NWB standard and limits the flexibility of data processing pipelines.

Describe the solution you'd like

Minimum solution (must-have):

  • Raise a clear warning when image series are detected in the NWB file that aren't matched with task information, alerting users that these video files will not be imported into the VideoFile table

Intermediate solution (highly recommended):

  • Add dedicated documentation explaining that task information is currently required for video files to be imported
  • Document the specific relationship between the tasks module and video import
  • Provide clear examples of how to structure NWB files to ensure successful video import

Optimal solution (ideal):

  • Fully decouple video and task data at the database schema level
  • Allow video files to be inserted into Spyglass independently of task/epoch information
  • Support optional linking of video data to task epochs when relevant, rather than requiring it
  • Enable workflows where video data is processed independently before being associated with task information

Describe alternatives you've considered

The current workaround is to always create task/epoch information alongside video data, even when it's not semantically meaningful or when the task information is still being developed. This workaround:

  • Requires creating placeholder or dummy task data
  • Adds unnecessary complexity to data ingestion pipelines
  • Obscures the actual experimental structure when tasks are added artificially

Additional context

Here is a minimal reproduction of the problem.

from pynwb.testing.mock.file import mock_NWBFile
from pynwb import NWBHDF5IO
from pathlib import Path
from ndx_franklab_novela import CameraDevice
from pynwb.image import ImageSeries
from pynwb.core import DynamicTable
import numpy as np


def add_task(nwbfile):
    tasks_module = nwbfile.create_processing_module(name="tasks", description="tasks module")
    task_table = DynamicTable(name="task_table_1", description="my task table")
    task_table.add_column(name="task_name", description="Name of the task.")
    task_table.add_column(name="task_description", description="Description of the task.")
    task_table.add_column(name="camera_id", description="Camera ID.")
    task_table.add_column(name="task_epochs", description="Task epochs.")
    task_table.add_row(
        task_name="task1", 
        task_description="task1 description", 
        camera_id=[1], 
        task_epochs=[1]
    )
    tasks_module.add(task_table)


def add_video(nwbfile):
    camera_device = CameraDevice(
        name="camera_device 1",
        meters_per_pixel=1.0,
        model="my_model",
        lens="my_lens",
        camera_name="my_camera_name",
    )
    nwbfile.add_device(camera_device)

    video_file_path = "/path/to/video.h264"
    timestamps = np.linspace(0, 10, 300)

    image_series = ImageSeries(
        name="my_image_series",
        description="Video recording without associated task information",
        unit="n.a.",
        external_file=[video_file_path],
        format="external",
        timestamps=timestamps,
        device=camera_device,
    )
    nwbfile.add_acquisition(image_series)


def insert_session(nwbfile_path: Path):
    import datajoint as dj
    dj_local_conf_path = "/Users/pauladkisson/Documents/CatalystNeuro/Spyglass/spyglass/dj_local_conf.json"
    dj.config.load(dj_local_conf_path)
    import spyglass.common as sgc
    import spyglass.data_import as sgi
    from spyglass.utils.nwb_helper_fn import get_nwb_copy_filename
    nwb_copy_file_name = get_nwb_copy_filename(nwbfile_path.name)
    (sgc.Nwbfile & {"nwb_file_name": nwb_copy_file_name}).delete()
    
    sgi.insert_sessions(str(nwbfile_path), rollback_on_fail=True, raise_err=True)

    print(sgc.VideoFile())
    # If add_task is called,
    # *nwb_file_name *epoch    *video_file_nu camera_name    video_file_obj
    # +------------+ +-------+ +------------+ +------------+ +------------+
    # mock_video_tas 1         1              my_camera_name 1bdd667f-67a3-
    # (Total: 1)

    # If add_task is NOT called,
    # *nwb_file_name *epoch    *video_file_nu camera_name    video_file_obj
    # +------------+ +-------+ +------------+ +------------+ +------------+

    # (Total: 0)


def main():
    nwbfile = mock_NWBFile(
        identifier="video_task_coupling_bug_demo",
        session_description="Mock NWB file demonstrating Spyglass video/task coupling issue"
    )
    nwbfile.add_epoch(start_time=0.0, stop_time=10.0, tags=["01"])
    nwbfile.create_processing_module(
        name="behavior", 
        description="Behavioral data including video"
    )
    
    add_task(nwbfile) # Comment this line to test without task information
    add_video(nwbfile)
    
    nwbfile_path = Path("/Volumes/T7/CatalystNeuro/Spyglass/raw/mock_video_task_coupling.nwb")
    if nwbfile_path.exists():
        nwbfile_path.unlink()
    
    nwbfile_path.parent.mkdir(parents=True, exist_ok=True)
    
    with NWBHDF5IO(nwbfile_path, "w") as io:
        io.write(nwbfile)
    
    print(f"Mock NWB file written to {nwbfile_path}")
    
    insert_session(nwbfile_path=nwbfile_path)


if __name__ == "__main__":
    main()

Behavior:

The script demonstrates a silent failure when task information is not present. The video data import into the VideoFile table depends on the presence of task information:

With task information (add_task(nwbfile) called):

*nwb_file_name *epoch    *video_file_nu camera_name    video_file_obj
+------------+ +-------+ +------------+ +------------+ +------------+
mock_video_tas 1         1              my_camera_name 1bdd667f-67a3-
(Total: 1)

Without task information (add_task(nwbfile) commented out):

*nwb_file_name *epoch    *video_file_nu camera_name    video_file_obj
+------------+ +-------+ +------------+ +------------+ +------------+

(Total: 0)

Note that the import process completes without errors or warnings in both cases. When task information is absent, the video data is silently ignored, leading to incomplete data import. This silent failure makes debugging difficult and can lead to users unknowingly losing video data during the import process.

Metadata

Metadata

Assignees

Labels

NWB ingestionProblems with loading nwb files into spyglassbugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions