Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

### New Checks
* Added `check_file_extension` for NWB file extension best practice recommendations (`.nwb`, `.nwb.h5`, or `.nwb.zarr`) [#625](https://github.com/NeurodataWithoutBorders/nwbinspector/pull/625)
* Added `check_time_intervals_duration`, which makes sure that `TimeInterval` objects do not have a duration greater than 1 year.
[#635](https://github.com/NeurodataWithoutBorders/nwbinspector/pull/635)

### Improvements
* Added documentation to API and CLI docs on how to use the dandi config option. [#624](https://github.com/NeurodataWithoutBorders/nwbinspector/pull/624)
Expand Down
2 changes: 2 additions & 0 deletions docs/best_practices/tables.rst
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,8 @@ Times are always stored in seconds in NWB. In :ref:`nwb-schema:sec-TimeIntervals

Additional time columns in :ref:`nwb-schema:sec-TimeIntervals` tables, such as the ``TrialsTable`` should have ``_time`` as a suffix to the name. *E.g.*, if you add more times in ``TrialsTable``, such as a subject response time, name it ``response_time`` and store the time values in seconds from the ``timestamps_reference_time`` of the :ref:`nwb-schema:sec-NWBFile`, just like ``start_time`` and ``stop_time``. This convention is used by downstream processing tools. For instance, NWBWidgets uses these times to create peri-stimulus time histograms relating spiking activity to trial events. See :ref:`best_practice_global_time_reference` for more details.

Check function :py:meth:`~nwbinspector.checks._tables.check_time_intervals_duration`

.. _best_practice_unique_dynamic_table_ids:

Unique ids
Expand Down
1 change: 1 addition & 0 deletions src/nwbinspector/checks/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@
check_table_time_columns_are_not_negative,
check_table_values_for_dict,
check_time_interval_time_columns,
check_time_intervals_duration,
check_time_intervals_stop_after_start,
)
from ._time_series import (
Expand Down
59 changes: 59 additions & 0 deletions src/nwbinspector/checks/_tables.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
)

NELEMS = 200
MAX_DURATION = 3600 * 24 * 365.25 # default: 1 year


@register_check(importance=Importance.CRITICAL, neurodata_type=DynamicTableRegion)
Expand Down Expand Up @@ -292,3 +293,61 @@ def check_table_time_columns_are_not_negative(table: DynamicTable) -> Optional[I
)

return None


@register_check(importance=Importance.CRITICAL, neurodata_type=TimeIntervals)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a critical error? We are basically saying that they won't ever be such cases. Seems too strong to me.

We should have a way of handling "things that are usually errors but there is a small probability that the user knows what they are doing and they can move forward"

The solution could a non-trivial barrier to enable this like an environment variable that could skip this error (as a CLI argument would be hard to propagate to DANDI) or something of the like.

def check_time_intervals_duration(
time_intervals: TimeIntervals, duration_threshold: float = MAX_DURATION
) -> Optional[InspectorMessage]:
"""
Check if the duration spanned by time columns in a TimeIntervals table exceeds a threshold.

This check examines start_time, stop_time, and any other columns ending in _time.

Best Practice: :ref:`best_practice_time_interval_time_columns`

Parameters
----------
time_intervals: TimeIntervals
The table to check
duration_threshold: float, optional
Maximum expected duration in seconds. Default is 1 year (365.25 days).
"""
if len(time_intervals.id) == 0:
return None

start_times = []
end_times = []

# Check for start_time and stop_time columns
if "start_time" in time_intervals.colnames and len(time_intervals["start_time"]) > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and with the other time columns we are assuming that the time columns are already well-ordered. Is there a way of running checks in order? Maybe this check does not make sense if the other fails and the output will confuse rather than clarify if the ascending order check fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this check requires that the times are in order. If they are not, this check may fail to raise a message. I suppose we could just read all the data. That's probably fine in most cases -- I doubt we'll come across many datasets where these datasets are large. Do you think it's better to just read all of the time arrays?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is definitely more safe.

If we want to be more efficient maybe we can just combine these two checks:

@register_check(importance=Importance.BEST_PRACTICE_VIOLATION, neurodata_type=TimeIntervals)
def check_time_interval_time_columns(
time_intervals: TimeIntervals, nelems: Optional[int] = NELEMS
) -> Optional[InspectorMessage]:
"""
Check that time columns are in ascending order.
Parameters
----------
time_intervals: TimeIntervals
nelems: int, optional
Only check the first {nelems} elements. This is useful in case there columns are
very long so you don't need to load the entire array into memory. Use None to
load the entire arrays.
"""
unsorted_cols = []
for column in time_intervals.columns:
if column.name == "start_time":
if not is_ascending_series(column.data, nelems):
unsorted_cols.append(column.name)
if unsorted_cols:
return InspectorMessage(
message=(
f"{unsorted_cols} are time columns but the values are not in ascending order. "
"All times should be in seconds with respect to the session start time."
)
)
return None

To avoid reading them twice.

I think either way is fine.

start_times.append(float(time_intervals["start_time"][0]))
if "stop_time" in time_intervals.colnames and len(time_intervals["stop_time"]) > 0:
end_times.append(float(time_intervals["stop_time"][-1]))

# Check for other time columns
for column_name in time_intervals.colnames:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that all the times in the time columns should be smaller than the max(time_intervals[stop_time`].data), right? Maybe that could be a check on its own.

if (
column_name.endswith("_time")
and column_name not in ["start_time", "stop_time"]
and len(time_intervals[column_name]) > 0
):
data = time_intervals[column_name]
start_times.append(float(data[0]))
end_times.append(float(data[-1]))

if start_times and end_times:
duration = max(end_times) - min(start_times)

if duration > duration_threshold:
duration_years = duration / 31557600.0
threshold_years = duration_threshold / 31557600.0
return InspectorMessage(
message=(
f"TimeIntervals table '{time_intervals.name}' has a duration of {duration:.2f} seconds "
f"({duration_years:.2f} years), which exceeds the threshold of "
f"{duration_threshold:.2f} seconds ({threshold_years:.2f} years). "
"Please verify that this is correct."
)
)
return None
70 changes: 70 additions & 0 deletions tests/unit_tests/test_tables.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
check_table_time_columns_are_not_negative,
check_table_values_for_dict,
check_time_interval_time_columns,
check_time_intervals_duration,
check_time_intervals_stop_after_start,
)

Expand Down Expand Up @@ -498,3 +499,72 @@ def test_table_time_columns_are_not_negative_multidimensional_pass():
test_table.add_row(test_time=[0.0, 1.0, 2.0, 3.0])

assert check_table_time_columns_are_not_negative(test_table) is None


def test_check_time_intervals_duration_pass_short():
"""Test that short duration tables pass the check."""
table = TimeIntervals(name="trials", description="test trials")
table.add_row(start_time=0.0, stop_time=10.0)
table.add_row(start_time=15.0, stop_time=25.0)
table.add_row(start_time=30.0, stop_time=100.0)

assert check_time_intervals_duration(table) is None


def test_check_time_intervals_duration_fail_exceeds_threshold():
"""Test that tables with duration exceeding 1 year fail."""
one_year = 31557600.0
table = TimeIntervals(name="trials", description="test trials")
table.add_row(start_time=0.0, stop_time=100.0)
table.add_row(start_time=one_year + 1000, stop_time=one_year + 2000)

result = check_time_intervals_duration(table)
assert result is not None
assert "trials" in result.message
assert "exceeds the threshold" in result.message
assert result.importance == Importance.CRITICAL


def test_check_time_intervals_duration_pass_empty():
"""Test that empty tables pass."""
table = TimeIntervals(name="trials", description="test trials")
assert check_time_intervals_duration(table) is None


def test_check_time_intervals_duration_pass_custom_threshold():
"""Test that custom threshold works correctly."""
table = TimeIntervals(name="trials", description="test trials")
table.add_row(start_time=0.0, stop_time=100.0)
table.add_row(start_time=150.0, stop_time=200.0)

# Should fail with 100 second threshold
result = check_time_intervals_duration(table, duration_threshold=100.0)
assert result is not None

# Should pass with 300 second threshold
result = check_time_intervals_duration(table, duration_threshold=300.0)
assert result is None


def test_check_time_intervals_duration_with_additional_time_columns():
"""Test that the check considers additional time columns ending in '_time'."""
one_year = 31557600.0
table = TimeIntervals(name="trials", description="test trials")
table.add_column(name="custom_time", description="custom time column")
table.add_row(start_time=0.0, stop_time=100.0, custom_time=0.0)
table.add_row(start_time=150.0, stop_time=200.0, custom_time=one_year + 1000)

result = check_time_intervals_duration(table)
assert result is not None
assert "trials" in result.message
assert "exceeds the threshold" in result.message


def test_check_time_intervals_duration_pass_without_start_stop():
"""Test that tables with only other time columns work correctly."""
table = TimeIntervals(name="trials", description="test trials")
table.add_column(name="custom_time", description="custom time column")
table.add_row(start_time=0.0, stop_time=10.0, custom_time=5.0)
table.add_row(start_time=15.0, stop_time=25.0, custom_time=20.0)

assert check_time_intervals_duration(table) is None
Loading